Welcome to the Ada for micro:bit series where we look at simple examples to learn how to program the BBC micro:bit with Ada.

In this first part we will see how to setup an Ada development environment for the micro:bit.

The micro:bit

The Micro:Bit is a very small ARM Cortex-M0 board designed by the BBC for computer education. It's fitted with a Nordic nRF51 Bluetooth enabled microcontroller and an embedded programmer. You can get it at:

The projects in this series will also require basic electronic components (LEDs, resistors, potentiometer, buzzer). If you don't have those items, we recommend one of the micro:bit starter kits like the Kitronik Inventor's Kit:

Installation

You will need both the x86_64 and arm-elf packages.

Once you have installed the two packages, you can download the sources of the Ada_Drivers_Library project: here. Unzip the archive in your document folder for instance.

Linux only

On Linux, you might need privileges to access the USB programmer of the micro:bit, without which the flash program will say "No connected boards".

To do this on Ubuntu, you can create (as administrator) the file /etc/udev/rules.d/mbed.rules and add the line:

SUBSYSTEM=="usb", ATTR{idVendor}=="0d28", ATTR{idProduct}=="0204", MODE="0666"

then restarting the service by doing

$sudo udevadm trigger First program Start the GNATstudio development environment that you installed earlier, click on "Open Project" and select the file "Ada_Drivers_Library-master\examples\MicroBit\text_scrolling/text_scrolling.gpr" from the archive that you extracted earlier. Click on the "Build all" icon in the toolbar to compiler the project. Plug your micro:bit using a USB micro cable. And finally click on the "Flash to board" icon in the toolbar to run the program on the micro:bit. You should see a text scrolling on the LEDs of the micro:bit: That's it for the setup of your Ada development environment for the micro:bit. See you next week for another Ada project on the micro:bit. Don't miss out on the opportunity to use Ada in action by taking part in the fifth annual Make with Ada competition! We're calling on developers across the globe to build cool embedded applications using the Ada and SPARK programming languages and are offering over$9,000 in total prizes. Find out more and register today!

]]>
GNATcoverage: getting started with instrumentation https://blog.adacore.com/gnatcoverage-getting-started-with-instrumentation Thu, 10 Sep 2020 12:34:33 +0000 Pierre-Marie de Rodat https://blog.adacore.com/gnatcoverage-getting-started-with-instrumentation

This is the second post of a series about GNATcoverage and source code instrumentation. The previous post introduced how GNATcoverage worked originally and why we extended it to support source instrumentation-based code coverage computation. Let’s now see it in action in the most simple case: a basic program running on the host machine, i.e. the Linux/Windows machine that runs GNATcoverage itself.

Source traces handling

Here is a bit of context to fully understand the next section. In the original GNATcoverage scheme, coverage is inferred from low level execution trace files (“*.trace”) produced by the execution environment. These traces essentially contain a summary of program machine instructions that were executed. We call these “binary traces”, as the information they refer to is binary (machine) code.

With the new scheme, based on the instrumentation of source code, it is instead the goal of each instrumented program to create trace files. This time, the information in traces refers directly to source constructs (declarations, statements, IF conditions, …), so we call them “source traces” (“*.srctrace” files).

The data stored in these files is conceptually simple: some metadata to identify the sources to cover and a sequence of booleans that indicate whether each coverage obligation is satisfied. However, for efficiency reasons, instrumented programs must encode this information in source traces files using a compact format, which is not trivial to produce. To assist instrumented programs in this task, GNATcoverage provides a “runtime for instrumented programs” as a library project: gnatcov_rts_full.gpr, for native programs which have access to a full runtime (we will cover embedded targets in a future post).

Requirements

First, the GNATcoverage instrumenter needs a project file that properly describes the closure of source files to instrument as well as the program main unit. This is similar to what a compiler needs: access to all the dependencies of a source file in order to compile it.

Next, this blog series assumes the use of a recent GPRbuild (release 20 or beyond), for the support of two switches specifically introduced to facilitate building instrumented sources without modifying project files. What the new options do is conceptually simple so it would be possible to build without this, just less convenient.

Then the source constructs added by the instrumentation expect an Ada 95 compiler. The instrumenter makes several compiler-specific assumptions (for instance when handling Pure/Preelaborate units), so for now we recommend using a recent GNAT compiler.

Finally, users need to build and install the “runtime for instrumented programs” described in the previous section. To make sure the library code can be linked with the program to analyze, the library first needs to be built with the same toolchain, then installed:

# Create a working copy of the runtime project.
# This assumes that GNATcoverage was installed
#  in the /install/path/ directory.
$rm -rf /tmp/gnatcov_rts$ cp -r /install/path/share/gnatcoverage/gnatcov_rts /tmp/gnatcov_rts
$cd /tmp/gnatcov_rts # Build the gnatcov_rts_full.gpr project and install it in # gprinstall’s default prefix (most likely where the toolchain is installed).$ gprbuild -Pgnatcov_rts_full
$gprinstall -Pgnatcov_rts_full Note that depending on your specific setup, the above may not work without special filesystem permissions, for instance if the toolchain/GPRbuild was installed by a superuser. In that case, you can install the runtime to a dedicated directory and update your environment so that GPRbuild can find it: add the --prefix=/dedicated/directory argument to the gprinstall command, and add that directory to the GPR_PROJECT_PATH environment variable. A first example Now that prerequisites are set up, we can now go ahead with our first example. Let’s create a very simple program: -- example.adb procedure Example is begin […] end Example; -- example.gpr project Example is for Main use (“example.adb”); for Object_Dir use “obj”; end Example; Before running gnatcov, let’s make sure that this project builds fine: $ gprbuild -Pexample -p
$obj/example […] Great. So now, let’s instrument this program to compute its code coverage: $ gnatcov instrument -Pexample --level=stmt --dump-trigger=atexit

As its name suggests, the “gnatcov instrument” command instruments the source code of the given project. The -Pexample and --level=stmt options should be familiar to current GNATcoverage users: the former requests the use of the “example.gpr” project, to compute the code coverage of all of its units, and --level=stmt tells gnatcov to analyze statement coverage.

The --dump-trigger=atexit option is interesting. As discussed earlier, instrumented programs need to dump their coverage state into a file (the trace file), that “gnatcov coverage” reads in order to produce a coverage report. But when should that dump happen? Since one generally wants reports to show all discharged obligations (fancy words meaning: executed statements, decision outcomes exercized, …), the goal is to create the trace file after all code has executed, right before the program exits. However some programs are designed to never stop, running an endless loop (Ravenscar profile), so this trace file creation moment needs to be configurable. --dump-trigger=atexit tells the instrumenter to use the libc’s atexit routine to trigger file creation when the process is about to exit. It’s suitable for most programs running on native platforms, and makes trace file creation automatic, which is very convenient.

Now is the time to build the instrumented program:

$gprbuild -Pexample -p --src-subdirs=gnatcov-instr --implicit-with=gnatcov_rts_full Even seasoned GPRbuild users will wonder about the two last options. --src-subdirs=gnatcov-instr asks GPRbuild to consider, in addition to the regular source directories, all “gnatcov-instr” folders in object directories. Here that means that GPRbuild will first look for sources in “obj/gnatcov-instr” (as “obj” is example.gpr’s object directory), then for sources in “.” (example.gpr’s regular source directory). But what is “obj/gnatcov-instr” anyway? When it instruments a project, gnatcov must not modify the original sources, so instead it stores instrumented sources in a new directory. The general rule of thumb for programs that deal with project files is to use projects’ object directory (Object_Dir attribute) to store artifacts; “gnatcov instrument” thus creates a “gnatcov-instr” subdirectory there and puts instrumented sources in it. Afterwards, passing --src-subdirs to GPRbuild is the way to tell it to build instrumented sources instead of the original ones. The job of --implicit-with=gnatcov_rts_full is simple: make GPRbuild consider that all projects use the gnatcov_rts_full.gpr project, even though they don’t contain a “with “gnatcov_rts_full”;” clause. This allows instrumented sources (in obj/gnatcov-instr) to use features in the gnatcov_rts_full project even though “example.gpr” does not request it. In other words, both --src-subdirs and --implicit-with options allow GPRbuild to build instrumented sources with their extra requirements without having to modify the project file of the project to test/cover (example.gpr). We are getting closer to the coverage report. All we have to do it to finally run the instrumented program, to create a source trace file: $ obj/example
Fact (1) = 1
$ls *.srctrace example.srctrace So far, so good. By default, the instrumented program creates in the current directory a trace file called “XXX.srctrace” where XXX is the basename of the executed binary, but one can choose a different filename by setting the GNATCOV_TRACE_FILE environment variable to the name of the trace file to create. Now that we have a trace file, the rest will be familiar to GNATcoverage users: $ gnatcov coverage -Pexample --level=stmt --annotate=xcov example.srctrace

Second Ada + LoRa port: Heltec LoRa Node 151

The second board is the Heltec LoRa Node 151. This is an STM32L151 + SX1276:

After the Ronoth board, this one was a more modern design with more ram. It was a smooth port and was transmitting and receiving smoothly. Its half the price of the Ronoth.

Third Ada + LoRa port: Blkbox 915MHz LoRa SX1276 module

Then there is the Blkbox 915MHz LoRa SX1276 module, here seen connected to a Bluepill (STM32F103).

This module is my favorite. Its cheap, about $7 works fine and can be moved around from target to target. Initially as pictured, I had it ported to the Bluepill. Later I moved it to the complex STM32L552 and the equally challenging STM32WB55 where it acts as a server. Fourth port the new STM32L552: The STM32L552 challenge Whilst the other STM32 ports are not new ground for me wrt getting Ada going on ST platforms, the STM32L552 is exceptionally challenging. I have some experience with embedded platforms and the Cortex-M33 in its various forms from Nordic, NXP and now ST are truly some of the most complex controller designs I have ever worked with. The bugs you can get are really epic. I could fill pages here recounting all the really messy mind bending bugs. The bullet points below outline the work, but at each stage learning and bugs were involved. 1) It uses the ARMv8M architecture. No support in OpenOCD for this targetNo direct Ada support as the lIb is ARMv7M. 2) It is based on a Cortex-M33 with TrustZone. 3) Ada_Drivers_Library is designed for a single view of the peripheral space not a peripheral space partitioned into Secure and Non-secure areas. 4) Debugging also needs to consider ARMv8M+TrustZone and how that effects register/memory/flash reading and writing. 5) To that end openocd was ported to this platform: https://github.com/morbos/openocd-0.10.1 From there you can attach to the board: openocd -f board/st_nucleo_l552.cfg The usual gdb loading commands work for reading and writing flash and ram. 6) The methodology for the port was to bolt two separate Ada ELF32 binaries into the final image. One image is a Secure boot and API handler, then a very small C glue layer that handles gcc ARMv8M stuff since gnat2019's gcc libs cannot be linked with ARMv8M code yet. Also some future pragma's would be needed in the Ada world to accommodate the special S <=> NS bindings + access to NS functions. (the BLXNS and SG instructions and cleaning up the unbanked register state before BLXNS to avoid S state leaks to the NS side). Finally the NS image is the last piece. For S I am using the Ravenscar sfp and for NS, Ravenscar full. Before NS can toggle an LED or touch any peripheral S has to be told what is allowed to be used by the NS side, extra boilerplate unneeded in a non-secure environment. 7) The basic structure of the flash is the boot area (formally 0x08000000, is now S at 0x0c000000 for a secure boot Ada Elf32 binary) it curr occupies 40-60k of flash. A watermark is created in the ST flash option regs to divide the flash into 2 regions, I chose S from 0x0c000000 - 0x0c01ffff) NS from 0x0802000 - end. The secure boot area also has the veneer code to allow NS to call back into S. You need a magic SG instruction anywhere you want NS's PC to touch down, any other opcode is an abort. Also the region for the NS PC touch down must be marked Non Secure Callable (NSC), or, another cryptic abort. The S & NS Ada programs are Cortex-M4F compiler builds, the veneer code is in C, and it is compiled as Cortex-M33 Above we see the S_To_NS Ada call going to a C s_to_ns. A magic function ptr with special attribute ensures that the function pointer produces the blxns shown plus a boatload of assembly to wipe out the CPU regs so that no leaks are present to the NS side. Note that the NS side is a full Cortex-M executable with vector table. Then we have the NS side calling the LoRa radio IP's API. Let's look at Recv_SX1276 as an example API call The call goes to the C wrapper recv_sx1276_from_ns. Notice that due to it's attribute, its veneer entry point starts with the ARMv8M instruction: sg an sg is the only instruction ns can execute upon arrival in s. Any other instruction and you will be in the debugger, debugging a crash. After the sg, a veneer branch is made to a special veneer that finally calls our Ada code via an export of its Recv_SX1276 as C callable. Upon arrival, we see pointer types. How do we on the secure side know that those pointers are safe (i.e. NS). We need to use another ARMv8M instruction to validate the pointer(s). The tt instruction as shown: If any of those pointers are S, we return. This is defensive coding in a S/NS environment. A really challenging port. For example 2 Ravenscar runtimes on one SoC, when an exception comes to/from secure/non-secure, how is context switch decided? This was very challenging as process starvation was a very real bug. Detail: http://www.hrrzi.com/2019/12/ada-on-cm33.html There was no SW initially for this board, ordered in Nov, which came to my house early Dec. So I had 2 months to get the whole show going. Secure/Non-secure peripheral base handling, this is interesting. How Ada sets up the peripheral bases for the driver code that shared between S/NS. We see the magic in the declaration, the S base is always 16#1000_000# shifted from the NS base. In this way, legacy Ada_Drivers_Library code that just refers to GPIO_A for example will 'do the right thing' based on the stance that the library was built with (the Secure_Code constant). Fifth port the STM32WB55 + LoRa. A LoRa server to BLE bridge An Ada programmed STM32WB55 rounds out the show. Here is the server having its LoRa module traffic being analyzed by a Saleae. This was also a big effort done after the last Make with Ada contest. Last contest was a SensorTile with Ada running the BLE stack. This WB55 is a collapsed SensorTile, the BLE radio is not an SPI peripheral here but has been absorbed into an SoC with HW inter-process communication being used instead of SPI. Lots of issues getting a port to this platform. (No SVD file initially(!)). Had a good bug with LoRa SPI. The SPI flags were at the wrong bit offset so no rx/tx notifications were received. This turned out to be an SVD file issue. Took a day to debug that one as an SVD file has a lot of leverage in a port, it will be the last place you look. For the STM32WB55 nucleo board. There are 2 in the blister pack a large board and a small dongle. I am using the large board and a Blkbox 915MHz LoRa SX1276 module. Detail: Radio Freq In the US LoRa is restricted to 915Mhz. I thought this Ada code really elegantly solves the freq to 24bit coding the SX1276 uses: And its usage: Radio message debug The Saleae could capture BlkBox LoRa SPI radio traffic between the server on the STM32WB55 and the STM32L552 secure client. Very helpful. At that moment it was 2 Ada programs with a new protocol and issues with retries and list corruption. One bug was quite interesting, the server is in receive 99% of the time, you might think that setting its fifo to 0 and reading the received packet would be solid. No, if the SX1276 state machine never leaves receive, it keeps an internal fifo pointer. Since its internal, it keeps monotonically increasing every packet, even with an overt reset of rx&tx pointers. Thus at each packet notification the code sees stale data at 0 as the internal pointer has moved. The 'fix' is to leave receive just long enough for that pointer to reload. Another day of debug was that issue. There were many of these types of issues. Putting it all together OK, so we have 6 nodes of varying pedigree and 1 gateway/server. The design of LoRa packet data is a to/from pair of bytes (ala RadioHead FW) but then we deviate, namely some fields for commands and values for message retries and sequence numbers. Message Protocol There are 4 messages: Client data structures are as so: Node discovery Every 5 seconds the server sends a broadcast ping, ping replies come in from: 1) Those nodes in airshot and that are ready to receive. 2) Those nodes whose non-retried reply can get back to the server. A broadcast ping creates a lot of RF hash wrt the replies. 3) Once the server sees the reply, the node is added to a table: This is how the network gets populated and grows as new nodes are discovered or old ones drop off. Note that 0 and 255 are not valid client node IDs. Notifications Once a node has been recorded by the server as active, it can begin to send and receive notifications. Today I only support 8bit notifications. 8 is more than enough for my target application. Ideally, I don't want long messages being sent. This 8 bit notify has been set so that bit0 is the LED on the board and bit1 is a user button. Only node#1 the STM32L552 has a usable user button. The plumbing of that interrupt was interesting btw, as its GPIO needs to be allowed to be read by the NS side, further, it can have its interrupt routed to NS. Again, all new ground and requiring deep study of the ref manual and board experimentation (no reference code, everything was the try, see, iterate method). Fortunately, I was using RAM for those tests or I think the flash would be worn out by now. New Ada_Drivers_Library support for s/ns external interrupt routing: The server's BLE stack is activated when the BLE phone application shows an LED icon press. Its a BLE notification. We then locally change the server's LED state to the requested value and then set a suspension object to let the LoRa task know that the LoRa network needs to be woken up with a notify8 message. The server's notification task then walks the actives list and sends notify8's to all connected nodes. Upon receipt of a reply, the original message is removed from the queue. If however after a timeout, a node has failed to respond, then the original message with same seq# is resent. Finally, after 15 retries with no reply, the message is retired. When the user button is pressed, the reverse happens. The button notification task on the client is activated via a suspension object, this then prepares a LoRa packet with a notify8 with bit2 set. Assuming it got over the air to the server after its own client retry count, then the server processes the notify8 by setting another suspension object that wakes up the same BLE task that handles the local button press. This then is transmitted to the BLE phone application where the event with timestamp is shown. Demo A demo has been coded up. The idea is that via ping packet replies, the server can build an array up of who is connected. Once that map is created the server knows what nodes can are alive on the LoRa network. From there via the Android app, we make a connection to LoRaDa (the name I gave the Ada BLE server on the STM32WB55). From there you can turn on a light and see any alert's. At this time, I only have a user button on the STM32L552 Nucleo board, so it is the only board that sends the alert. The nodes in the LoRa network are as so: 0: STM32WB55 (Ada coded LoRa server + Ada BLE central) 1: STM32L552 (Ada coded secureboot for radio, and non-secure LoRa client). 2: Heltec LoRa Node 151 #1 (Ada coded LoRa client) 3: Heltec LoRa Node 151 #2 (Ada coded LoRa client) 4: Ronoth LoDev S76S #1 (Ada coded LoRa client) 5: Ronoth LoDev S76S #2 (Ada coded LoRa client) 6: Bluepill STM32F103 So when nodes arrive into the network, when the light button is toggled on the Android app, it cycles through all the connected nodes and sets each ones light to the state requested. All during this, node #1 can signal an alert that shows up as a red bell icon with a timestamp in the app. Range The range is outstanding, I walked prob 500m from our house and still had a signal! This tech is more than adequate for low rate sensor data. Ada I doubt I could have hacked this without using Ada. Once I had the client code up on the STM32L552 and was migrating the changes to the Heltec, Ronoth and bluepill boards, I don't think I spent more than 30mins getting the client changes up and working smoothly on those targets. I had already done a bare bones LoRa port to each but the client task version was not. So Ada made the job a pleasure and at least in the doldrums of code bugs I could know and rely on the fact the compiler was 100% there and solid despite my code problems. Conclusion I thought last years Make with Ada was a challenge but this one was seemingly straightforwards up until the Cortex-M33 needed much attention, especially low level work such as flashing S&NS images in OpenOCD that pulled me away from the LoRa client and server. BTW, that flashing bugs and fix are quite fascinating but frustrating too given the schedule. The server too was added late in the project! Originally I had a Dragino LG01 LoRa access point. That LoRa node is programmed in a C++ Arduino env, doable, but not Ada-esque. Finally a bulb went on in my head, why not use the BLE work on the STM32WB55 and make a LoRa to BLE bridge. How hard could that be? :) It was another challenge on top of an already challenging project. Much of the work was on the protocol. Debugging radio code on both the client and servers. The server has the most advanced radio work as it has an async receive task. The server is 99% of the time in receive and drops out of that to transmit from time to time. Timing is really subtle with these tasks, if you mess it up, the BLE can drop the connection or the LoRa network starts too many retries. I spent quite a bit of time on this protocol and it still needs some work. One item for Ada users that can make LoRa quite inexpensive is a Bluepill (<$2) and a BlkBox LoRa module ($6-$7) so for < 10, you can have an Ada controlled LoRa module. A good value I feel. Ultimately, ST will release a 48pin 7x7 STM32L5 processor, that will be pin compatible with the Bluepill boards. That will be Cortex-M33 based and might make for another interesting secure IoT solution. I plan to do a chip swap when its available as I did for the STM32L443 before. About 2 weeks ago ST announced the STM32WL, that is a single chip LoRa controller. So, all the solutions I showed are dual chip, a controller + radio. Through some deal with Semtech, they will have a single chip with the radio IP absorbed. Of course, the Ada code I worked on will need to run on this device when its available. Finally, whilst an Ada LoRa network and LoRa<->BLE gateway is novel, my feeling is the most interesting part of the work is the progress on the previously Ada untouched Cortex-M33. Ada is well known as a safe&secure language, the Cortex-M33 is a secure processor. So the marriage is a good one. Lets see if the Ada community can make some progress with this CPU, I already prepared a path and have shown its quite possible to get good results from it. • Access the project code here. ]]> Relaxing the Data Initialization Policy of SPARK https://blog.adacore.com/relaxing-the-data-initialization-policy-of-spark Tue, 28 Jul 2020 12:36:53 +0000 Claire Dross https://blog.adacore.com/relaxing-the-data-initialization-policy-of-spark SPARK always being under development, new language features make it in every release of the tool, be they previously unsupported Ada features (like access types) or SPARK specific developments. However, new features generally take a while to make it into actual user code. The feature I am going to present here is in my experience an exception, as it was used both internally and by external users before it made it into any actual release. It was designed to enhance the verification of data initialization, whose limitations have been a long standing issue in SPARK. In the assurance ladder, data initialization is associated with the bronze level, that is, the easiest to reach through SPARK. Indeed, most of the time, the verification of correct data initialization is achieved automatically without much need for user annotations or code modifications. However, once in a while, users encounter cases where the tool cannot verify the correct initialization of some data in their program, even though it is correct. Until recently, there were no good solutions for this problem. No additional annotation efforts could help, and users had to either accept the check messages and verify proper initialization by other means, or perform unnecessary initialization to please the tool. This has changed in the most recent releases of SPARK (SPARK community 2020 and recent previews of SPARK Pro 21). In this post, I describe a new feature, called Relaxed_Initialization, designed to help in this situation. First, let's get some insight on the problem. SPARK performs several analyses. Among them, flow analysis is used to follow the flow of information through variables in the program. It is fast and scales well, but it is not sensitive to values in the program. Said otherwise, it follows variable names through the control flow, but does not try to track their values. The other main analysis is formal proof. It translates the program into logical formulas that are then verified by an automated solver. It is precise, as it models values of variables at every program point, but it is potentially slow and requires user annotations to summarize the effect of subprograms in contracts. Verifications done by flow analysis are in general easier to complete, and so are associated with the bronze level in the assurance ladder, whereas verifications done by proof require more user inputs and are associated with levels silver or higher. In SPARK, data initialization is in general handled by flow analysis. Indeed, most of the time, it is enough to look at the control flow graph to decide whether something has been initialized or not. However, using flow analysis for verifying data initialization induces some limitations. Most notably: • Arrays are handled as a whole, because flow analysis would need to track values to know which indexes have been written by a component assignment. As a result, SPARK is sometimes unable to verify code which initializes an array by part (using a loop for example, as opposed to a single assignment through an aggregate). • As it does not require user annotations for checking data initialization, SPARK enforces a strict data initialization policy at subprogram boundary. In a nutshell, all inputs should be entirely initialized on subprogram entry, and all outputs should be entirely initialized on subprogram return. In recent releases of SPARK, it is possible to use proof instead of flow analysis to verify the correct initialization of data. This has the effect of increasing the precision of the analysis, at the cost of a slower verification process and an increased annotation effort. Since this is a trade-off, SPARK allows users to choose if they want to use flow analysis or proof in a fine grained manner on a per variable basis. By default, the lighter approach is preferred, and initialization checks are handled by flow analysis. To use proof instead, users should annotate their variables with the Relaxed_Initialization aspect. To demonstrate how this can be used to lift previous limitations, let us look at an example. As stated above, arrays are treated as a whole by flow analysis. Since initializing an array using a loop is a regular occurrence, flow analysis has some heuristics to recognize the most common cases. However, this falls short as soon as the loop does not cover the whole range of the array, elements are initialized more than one at a time, or the array is read during the initialization. In particular, this last case occurs if we try to describe the behavior of a loop using a loop invariant. As an example, Add computes the element-wise addition of two arrays of natural numbers:  type Nat_Array is array (Positive range 1 .. 100) of Natural; function Add (A, B : Nat_Array) return Nat_Array with Pre => (for all E of A => E < 10000) and then (for all E of B => E < 10000), Post => (for all K in A'Range => Add'Result (K) = A (K) + B (K)) is Res : Nat_Array; begin for I in A'Range loop Res (I) := A (I) + B (I); pragma Loop_Invariant (for all K in 1 .. I => Res (K) = A (K) + B (K)); end loop; return Res; end Add; The correct initialization of Res cannot be verified by flow analysis, because it cannot make sure that the invariant only reads initialized values. If we remove the invariant, then the initialization is verified, but of course the postcondition is not... Until now, the only solution to work around this problem was to add a (useless) initial value to Res using an aggregate. This was less than satisfactory... In recent versions of SPARK, I can instead specify that I want the initialization of Res to be verified by proof using the Relaxed_Initialization aspect:  Res : Nat_Array with Relaxed_Initialization; With this additional invariant, my program is entirely verified. Note that, when Relaxed_Initialization is used, the bronze level of the assurance ladder is no longer enough to ensure the correct initialization of data. We now need to reach the silver level, which may require adding more contracts and doing more code refactoring. Let's now consider the second major limitation of the classical handling of initialization in SPARK: the data initialization policy. As I have mentioned earlier, it requires that inputs and outputs of subprograms are entirely initialized at subprogram boundaries. As an example, I can consider the following piece of code which tries to read several natural numbers from a string using a Read_Natural procedure. It as an Error output which is used to signal errors occurring during the read:  type Error_Kind is (Empty_Input, Cannot_Read, No_Errors); subtype Size_Range is Natural range 0 .. 100; procedure Read_Natural (Input : String; Result : out Natural; Num_Read : out Natural) with Post => Num_Read <= Input'Length; -- Read a number from Input. Return in Num_Read the number of characters read. procedure Read (Input : String; Buffer : out Nat_Array; Size : out Size_Range; Error : out Error_Kind) is Num_Read : Natural; Start : Positive range Input'Range; begin -- If Input is empty, set the error code appropriately and return if Input'Length = 0 then Size := 0; Error := Empty_Input; return; end if; -- Otherwise, call Read_Natural until either Input is entirely read, -- or we have reached the end of Buffer. Start := Input'First; for I in Buffer'Range loop Read_Natural (Input (Start .. Input'Last), Buffer (I), Num_Read); -- If nothing can be read from Input, set the error mode and return if Num_Read = 0 then Size := 0; Error := Cannot_Read; return; end if; -- We have reached the end of Input if Start > Input'Last - Num_Read then Size := I; Error := No_Errors; return; end if; Start := Start + Num_Read; end loop; -- We have completely filled Buffer Size := 100; Error := No_Errors; end Read;  This example is not following the data initialization policy of SPARK, as I don't initialize Buffer when returning with an error. In addition, if Input contains less than 100 numbers, Buffer will only be initialized up to Size. If I launch SPARK on this example, flow analysis complains, stating that it cannot ensure that Buffer is initialized at the end of Read. To silence it, I can add a dummy initialization for Buffer at the beginning, for example setting every element to 0. However this is not what I want. Indeed, not only might this initialization be costly, but callers of Read may forget to check the error status and read Buffer, and SPARK won't detect it. Instead, I want SPARK to know which parts of Buffer are meaningful after the call, and to check that those only are accessed by callers. Here again, I can use the Relaxed_Initialization aspect to exempt Buffer from the data initialization policy of SPARK. To annotate a formal parameter, I need to supply the aspect on the subprogram and mention the formal as a parameter:  procedure Read (Input : String; Buffer : out Nat_Array; Size : out Natural; Error : out Error_Kind) with Relaxed_Initialization => Buffer; Now my procedure is successfully verified by SPARK. Note that I have initialized Size even when the call completes with errors. Indeed, Ada says that copying an uninitialized scalar, for example when giving it as an actual parameter to a subprogram call, is a bounded error. So the Relaxed_Initialization aspect wouldn't help here, as I would still need to initialize Size on all paths before returning from Read. Let's write some user code to see if everything works as expected. Use_Read reads up to 100 numbers from a string and prints them to the standard output:  procedure Use_Read (S : String) is Buffer : Nat_Array; Error : Error_Kind; Size : Natural; begin Read (S, Buffer, Size, Error); for N of Buffer loop Ada.Text_IO.Put_Line (N'Image); end loop; end Use_Read;  Here SPARK complains that Buffer might not be initialized on the call to Read. Indeed, as the local Buffer variable does not have the Relaxed_Initialization aspect set to True, SPARK attempts to verify that it is entirely initialized by the call. This is not what I want, so I annotate Buffer with Relaxed_Initialization:  Buffer : Nat_Array with Relaxed_Initialization; Now, if I run SPARK again on my example, I have another failed initialization check, this time on the call to Put_Line inside my loop. This one is expected, as I do not check the error status after my call to read. So I now fix my code so that it only accesses indices of Buffer which have been initialized by my read:  procedure Use_Read (S : String) is Buffer : Nat_Array with Relaxed_Initialization; Error : Error_Kind; Size : Natural; begin Read (S, Buffer, Size, Error); if Error = No_Errors then for N of Buffer (1 .. Size) loop Ada.Text_IO.Put_Line (N'Image); end loop; end if; end Use_Read; Unfortunately, it does not help, and the failed initialization check on the call to Put_Line remains the same. This is because I have not given any information about the initialization of Buffer in the contract of Read. With the usual data initialization policy of SPARK, nothing is needed, because SPARK enforces that all outputs are initialized after the call. However, since I have opted out of this policy for Buffer, I now need to use a postcondition to describe its initialization status after the call. This can be done easily using the 'Initialized attribute:  procedure Read (Input : String; Buffer : out Nat_Array; Size : out Size_Range; Error : out Error_Kind) with Relaxed_Initialization => Buffer, Post => (if Error = No_Errors then Buffer (1 .. Size)'Initialized else Size = 0); The postcondition states that if no errors occurred, then Buffer has been initialized up to Size. If I want my code to prove, I also need to supply an invariant at the end of the loop inside read:  pragma Loop_Invariant (Buffer (1 .. I)'Initialized); Now both Read and Use_Read are entirely proved, and if I tweak Use_Read to access a part of Buffer with no meaningful values, SPARK will produce a failed initialization check. The Relaxed_Initialization aspect provides a way to opt out of the strict data initialization policy of SPARK and work around the inherent imprecision of flow analysis on value sensitive checks. It enables the verification of valid programs which used to be out of the scope of the proof technology offered by SPARK. You can find more information in the user guide. Don't hesitate to try it in your project, and tell us if you think it is useful and how we can improve it! ]]> Make with Ada 2020: Disaster Management with Smart Circuit Breaker https://blog.adacore.com/make-with-ada-2020-disaster-management-smart-circuit-breaker Thu, 09 Jul 2020 14:18:03 +0000 Emma Adby https://blog.adacore.com/make-with-ada-2020-disaster-management-smart-circuit-breaker Shahariar's project won a finalist prize in the Make with Ada 2019/20 competition. This project was originally posted on Hackster.io here. Story Introduction Miniature Circuit Breaker (MCB) interrupts mains power when a short circuit or over-current occurs. The purpose is safety of electrical system from fire hazard. A smart circuit breaker will not only function as a regular MCB but also isolates incoming AC mains supply during disaster by sensing earthquake, fire/smoke, gas leakage or flood water. By disconnecting incoming power lines to equipment and power outlets inside house/office/industry during any disaster, it can reduce the chance of electrical hazard and ensuring safety of peoples life and assets. This system is programmed with Ada, where safety and security is critical. Demonstration Hardware and Theory of Operation Hardware modules Following parts are connected together to assemble the hardware according to the schematic below :- • Microbit: Runs safe firmware written in Ada for the system • MAA8693 Accelerometer: Earthquake sensing, onboard I2C sensor • 10 RGY LED Module: Fault Status indication, CC connection • Buzzer: Fault Alarm beeping and tone generation • TL431 External Reference: 2.5V reference for ADC measurement • Laser & Photo Transistor: Smoke Sensing with light interruption • MQ-5: Natural Gas (CnH2n+2) Leakage Sensor • Flood Sensor: Electrode to detects presence of flood water • Infrared Flame Sensor: Detects fire break out nearby • TP4056LiPo Charger Module: Charges up backup battery • Boost Module: Convert3.0-4.2 V from LiPo to 5.0V DC • Protoboards: Substrate and interconnection between modules • Power Supplies: LiPo Battery (backup) and 5V adapter (primary) • MCB / Relay Module***: Connect/Disconnect mains • Servo Motor: Trips MCB when smoke/fire/gas/vibration/water sensed • B & A button: Acknowledge fault and resume normal operation *** Note: Relay not used but can be used instead of MCB Hardware Pin Map All the GPIO, ADC, I2C pins are utilized as follows:- Schematic Here is the schematic for Smart Circuit Breaker hardware prototype :- Device Operation Device operates according to following flowchart :- • In Ada code, all the I/Os associated with sensors, modules and indication LEDs are initialized first. • Next, Smoke, Flame, Natural Gas, Earthquake, Flood sensing happens sequentially until a fault condition is detected. • Immediately after any fault detection, MCB will be tripped by the servo motor. • Then, LEDs associated with that of fault keeps blinking and buzzer keeps alarming. • User needs to press button B to acknowledge fault after taking care of the situation/disaster that triggered the fault in the first place. • Finally, user will flip the MCB manually to 'On' position and then press A to resume sensing again. • If a short circuit or over current occurs, MCB will just trip like a regular MCB. Build On a piece of protoboard the battery and charger modules are connected and secured with double sided tape and hot glue. This is the bottom layer circuit for powering the rest of the components. Header pins are soldered to carry power to the next layer. On the second layer (i.e the top one), rest of the sensors, modules and microbit is connected according to the schematic. Servo motor is tied with regular circuit breaker with cable tie and connected to the top layer board to get power from the battery & control signal from Micro:bit. Preparation for Ada programming Install all of these in the same directory/folder. Where to start: GNAT Programming Studio After downloading/installing GNAT IDE, arm driver into the same directory, open example code from: C:\GNAT\2019\share\gps\templates\microbit_example Open one of the examples (i.e. digital_out.gpr) for Microbit according to following steps and edit example project code as needed. • Step 1: Run GPS (GNAT Programming Studio) • Step 2: Select digital_out.gpr example project Step 3: Copy this project's code attached below and replace the example code in main.adb file Programming in Ada Following files are the most important files when working with GNAT Studio :- .gpr file is the GNAT project file for a project .adb file is the file where Ada code resides (src folder) .ads files is where the definitions and declarations goes Code snippets below are taken from the attached code of this project to briefly explain essential Ada programming styles :- Writing Comments in Ada Comments/ non executable lines in Ada starts with " -- " like this :- ----------------- edge connector pin mapping ---------------------- ------------------------------------------------------------------- -- See here : https://makecode.microbit.org/device/pins ----------- -- pin(code) pin (edge connector pads) hardware connected -- 0 -- large pad 0 -- servo motor control pin -- 1 -- large pad 1 -- Flame Sense IR module Anything after -- in a single line is a comment, whereas regular syntax ends with semicolon (;) Including Packages in Ada Syntax with "with" keyword is used to add package support to a program. When 'use' keyword is used for that package, it becomes visible/usable in the code with MicroBit.IOs; use MicroBit.IOs; -- includes microbit GPIO package with MicroBit.Time; -- includes microbit time package with MicroBit.Buttons; use MicroBit.Buttons; -- includes button package with MMA8653; use MMA8653; -- includes hal for accelerometer with MicroBit.Accelerometer; -- includes acceleratometer package For example: "with MicroBit.IOs" includes microbit GPIO control support to main.adb code. But including "use MicroBit.IOs" will enable to use variable types from MicroBit.IOs package (see below: Variables in Ada for detailed explanation) Similarly MMA8653 and MicroBit.Accelerometer enables support for onboard accelerometer chip microbit Variables/Constants in Ada Variables are declared in Ada in following format • Variable_Name : Type := Initial_Value; • Variable_Name : Type; Connected is a variable name, which is Boolean Type, it's initial Value is True. Fault_Flag is a variable name, which is Integer Type and it's initial Value is 0 Connected : Boolean := True; -- boolean type variable Fault_Flag : Integer := 0; -- integer type variable ADCVal : MicroBit.IOs.Analog_Value; -- variable type for ADC reading ADCtemp : MicroBit.IOs.Analog_Value; -- ADC type temp variable RedLED1_Smoke : constant MicroBit.IOs.Pin_Id := 13; RedLED2_Flame : constant MicroBit.IOs.Pin_Id := 8; Variable types are 'strict' in Ada. For example: ADCVal is not 'Integer' type but 'MicroBit.IOs.Analog_Value' type, although it will hold integer numbers between 0 to 1023 Similarly, RedLED1_Smoke has a constant value of 13, but it is not 'Integer' type constant, it is actually 'MicroBit.IOs.Pin_Id' type constant. To use these odd types of variable like : MicroBit.IOs.Analog_Value and MicroBit.IOs.Pin_Id. coder must include the 'use MicroBit.IOs; ' line of code before variable declaration. 'use' keyword allows programmer to use package specific types of variable. Ada Main Procedure and Loop Main procedure in Ada is the main function (equivalent of void main in c), which starts with 'procedure Main is' syntax, then comes the variable declaration. After that the 'begin' keyword begins the main procedure. Below 'begin' is the code which is usually initialization or single run code. Next starts the infinite 'loop' (equivalent of while(1) in c). Finally the 'end loop;' encloses the infinite loop and 'end Main;' ends the main procedure. Here is the ada code skeleton with comments showing what goes where :- -- package inclusion goes here procedure Main is -- variable declaration goes here begin -- initialization or one time executable code goes here loop -- body of recurring or looping code goes here end loop; end Main; ; (semicolon) is the end of a loop or procedure. There are no use for curly-braces {} If/else in Ada In Ada, if-else starts with 'if' keyword followed by logical condition and 'then' keyword, next is the code which will execute if the condition is true, otherwise the code below 'else' will execute. The 'if' statement ends with 'end if;' keyword if condition_is_true then -- do this else -- do that end if; Example :- if ADCVal >= ADCtemp then MicroBit.IOs.Set (RedLED1_Smoke, True); -- Write High to Disble LED else Fault := True; Fault_Flag := 1; Connected := False; end if; For Loop in Ada for tempval in 0 .. 9 loop MicroBit.IOs.Set(Servo_Pin,True); MicroBit.Time.Delay_Ms(1); MicroBit.IOs.Set(Servo_Pin,False); MicroBit.Time.Delay_Ms(19); end loop; Case and null in Ada Case in Ada starts with 'case' keyword, followed by a variable which will be checked and 'is' keywords. Then it checks matching with 'when' keyword followed by different possible values of the variable and ends with '=>' operator. Next is the code which executes when variable check match with a possible value. 'when other' keyword is for no match condition. Case ends with 'end case;' keyword 'null;' is for doing nothing when no match is found, which needs to be explicitly mentioned. Nothing is left for guess work in Ada !!! case variable_name is when 1 => -- do this when 2 => -- do that when others null; -- do nothing end case; Example :- case Fault_Flag is when 1 => -- smoke fault blinkey MicroBit.IOs.Set (RedLED1_Smoke, False); MicroBit.Time.Delay_Ms (100); MicroBit.IOs.Set (RedLED1_Smoke, True); MicroBit.IOs.Set(Buzzer_pin,False); MicroBit.Time.Delay_Ms (100); when 2 => -- fire fault blinkey MicroBit.IOs.Set (RedLED2_Flame, False); MicroBit.Time.Delay_Ms (100); MicroBit.IOs.Set (RedLED2_Flame, True); MicroBit.IOs.Set(Buzzer_pin,False); MicroBit.Time.Delay_Ms (100); when 3 => -- gas fault blinkey MicroBit.IOs.Set (RedLED3_NGas, False); MicroBit.Time.Delay_Ms (100); MicroBit.IOs.Set (RedLED3_NGas, True); MicroBit.IOs.Set(Buzzer_pin,False); MicroBit.Time.Delay_Ms (100); when 4 => -- earthquake fault blinkey MicroBit.IOs.Set (YellowLED1_Quake, False); MicroBit.Time.Delay_Ms (100); MicroBit.IOs.Set (YellowLED1_Quake, True); MicroBit.IOs.Set(Buzzer_pin,False); MicroBit.Time.Delay_Ms (100); when 5 => -- flood water fault blinkey MicroBit.IOs.Set (YellowLED2_Flood, False); MicroBit.Time.Delay_Ms (100); MicroBit.IOs.Set (YellowLED2_Flood, True); MicroBit.IOs.Set(Buzzer_pin,False); MicroBit.Time.Delay_Ms (100); when others => -- do nothing null; end case; Microbit specific APIs • MicroBit.Time.Delay_Ms (integer) -- delays operation for certain mili seconds • MicroBit.IOs.Set(Pin_Number, boolean) -- Output drive a GPIO pin • MicroBit.IOs.Analog(Pin_Number) -- returns a ADC value from an Analog pin • MicroBit.Buttons.State (Button_Name) = Pressed -- reads A/B buttons To use these Microbit specific APIs, following packages must be included first: with MicroBit.IOs; use MicroBit.IOs; with MicroBit.Time; with MicroBit.Buttons; use MicroBit.Buttons; • use MicroBit.IOs enables the use of MicroBit.IOs.Analog_Value type • use MicroBit.Buttons enables the use of Pressed type Examples :- with MicroBit.IOs; use MicroBit.IOs; -- includes microbit GPIO lib with MicroBit.Time; -- includes microbit timer lib with MicroBit.Buttons; use MicroBit.Buttons; -- includes ubit button A/B lib MicroBit.Time.Delay_Ms(500); -- 500 mS delay MicroBit.IOs.Set(2, True) -- sets pin 2 Logic-High MicroBit.IOs.Set(1, False) -- sets pin 1 Logic-Low ADCVal : MicroBit.IOs.Analog_Value; -- analog_value type variable,not an int ADCVal:= MicroBit.IOs.Analog(0) -- returns a value between 0 to 1023 MicroBit.Buttons.State (Button_A) = Pressed -- returns True is A is pressed Uploading Code Once the editing of the code is done, connect Microbit to computer with USB cable ( Windows will make ding-dong sound ). Then click : Build > Bareboard > Flash to Board > main.adbto flash code to Microbit. The Message window below will show code size and upload percent. If upload problem occurs, check USB cable or reinstall pyOCD. Ada Programming: Where Ada shines ? Ada isn't just another programming language. It shines where Safety, Security and Reliability matters. In systems where a hidden firmware/software bug could be fatal, life-threatening or damage of equipment might cause huge economic loss, those are the kind of systems where Ada can make a huge difference. For example, embedded system used in:- • Pacemaker & ICU Medical Equipment • Self Driving Vehicles • Explosive Igniter • Missile Guidance & Para-suite Launcher • Spaceship Life Support System • Lift Control • Fire Alarm & Safety • Automated Security • Enterprise Server Power Monitoring • Fail Safe Mechanism Monitoring • Power Plant Steam Generation • Radioactivity Monitoring • Chemical Process Control • Safety Critical Consumer Electronics (e.g. Induction Cooker) How Ada makes system safe and secure ? Ada compiler is very strict, it will keep bashing the coder/programmer with errors, warning, suggestions until a clear, well thought code is produced. Well, compilers in other programming languages do that, too ! But the difference is, things that are not even an error in other programming language is an error in Ada. Someone coming from C or Arduino Land will feel the punch. For example - when trying to add float with integer. In Ada, Apple does not add up with Banana. “think first, code later” - is the principle which Ada promotes ! Programmer must think clearly about the impact of each type/variable and code in a proper manner. There are other differences like writing style, operators. Practical Design Considerations This prototype is designed in a way, so that all the functions can be demonstrated easily. But for practical use, following actions are recommended: • Both Smoke and Flame sensor are sensitive to strong light, therefore proper shielding from direct light is recommended • Flood sensor should be placed near floor, where it can easily detect indoor flood water • Earthquake sensor is susceptible to vibrations, that is why Smart MCB should be mounted on rigid structure • Gas sensor requires 24 hrs break in period for proper operation • Proper PCB and enclosure is necessary for hardware reliability References Conclusion As I have already said, this is just a prototype hardware which I made with my limited resources. But Smart MCB is exactly the kind of application (safety critical) for which the spirit of Ada programming is intended. MCB was invented and patented by Hugo Stotz almost 100 years ago. I wish someone out there can turn this project into a real product and upgrade the century old MCB technology into Smart MCB for improved safety of next generation electrical distribution systems. • Access and download the project schematics here. • Access the project code here. • GNAT Community was used in this project, download it here. ]]> Make with Ada 2020: CryptAda - (Nuclear) Crypto on Embedded Device https://blog.adacore.com/make-with-ada-2020-cryptada Thu, 25 Jun 2020 11:17:34 +0000 Emma Adby https://blog.adacore.com/make-with-ada-2020-cryptada Team CryptAda's project won a finalist prize in the Make with Ada 2019/20 competition. This project was originally posted on Hackster.io here. Story The project sources As junior DevOps/SRE, we're quite interested in cryptography and its usage. Therefore, when we were asked to participate to this contest for a uni project, we decided to go for some cryptography-related subject. For the quite small knowledge we had on the subject, we thought that having a software on the embedded device that could generate RSA keys would be a good start. The first steps - Bignums In order to generate RSA private keys, we need a way to obtain big, very big prime numbers. Those numbers we need are way too big for classical number representation in ADA. We need a way to manipulate Bignums, aka number that can have hundreds of number of digits. Typical RSA keys as in 2020 contains 2048 or 4096 bits, so if we want to create such keys, we need bignums that can handle at least 4096 digits. For the project, we first looked at a library to handle without dependencies bignum, but this library was too limited and too slow. Therefore, we created a bignum library, fitting our needs (allocation on the stack, fast operations, base 256 for better performances, handling negative numbers, ...). We spent quite some time optimizing it, since numbers computations will be most of the CPU time used to generate a key. Pseudo-prime numbers The algorithm to find a prime number is quite easy to understand, and leave very few place for optimization. To assert that n is prime, we look at (x * x) mod n, with x going from 2 to sqrt(n). This operation is slow, and not suitable for 2048 bits numbers. To have an efficient way to assert that a number is prime, we switch to checking it to be pseudo-prime, with a sufficient probability. A pseudo-prime number means that this number has a probability n of being prime, and functions generating pseudo-prime number must allow us to choose this n value. We tweaked a bit, and found the best way to generate prime number with a probability > 99.999 % in a reasonable time. The algorithm is the following: - Check for it being prime with all the prime numbers < 100 using a ~sieve of Eratosthenes - Check for it being pseudo-prime using Fermat's primality test with {2, 3, 5, 7} - Run a Miller-Rabin test with 4 iterations using randomly generated witnesses - Run a Miller-Rabin test with a number of iterations depending on the number being checked with randomly generated witnesses When a number passes all of this tests, it is safe (up to a defined probability) to consider it being prime. Random number generator Random numbers are required to run such algorithms. But random must respect some conditions. We're not only looking for a Pseudo Random Number Generator (PRNG), but for a PRNG that either mix entropy or is Cryptographically Secure (CSPRNG). As we're on an embedded device, we have GPIO pins and sensors, so we chose the first option as entropy could be generated easily. We implemented a PRNG fed with entropy, inspired by the Linux PRNG (/dev/random). We have an entropy pool of 2048 bits, that is constantly fed by the noise generated by the 3-axis accelerometer using a mixing function. We also have an extraction function, that consume some of the entropy available in the pool (accounted by a estimation function that maintains a gauge every time an operation is performed on the entropy pool) to give a random number. In our application, entropy is collected periodically in the background, since we created an ADA Task to perform this operation. The implementation uses chacha20 as a hash function for the extraction process, which can be resumed as: - Hash the whole entropy pool - Fold the hash in a 16 bytes hash - Mix the hash back in the pool to mix-it and re-credit some entropy - Re-extract 16 bytes of entropy from the pool - Xor and fold the initial hash and the entropy newly extracted to create a 64 bits output value The entropy pool can also be fed by any source of entropy, and we also looked on nuclear decay. Unfortunately, our Geiger-Müller counter was shipped with a lot of delay, and we lack the time to interface it. RSA With prime numbers and a random number generator, it is finally possible to create RSA keys. The project offers a graphical interface on the touch screen of the board to choose the RSA key size between 256, 512 and 768, which are the biggest key that can be generated in a reasonable time on such low-specs board. The touch screen also permits the user to generate a key, and to print it on the USART connection. The project aims to be a good start for either an embedded crypto library, or for a device like a smartcard (Yubikey). A RSA key is dumped on the USART using the ASN.1 format, but as a conf file. Here's an example of a very small RSA key generated by the board: asn1=SEQUENCE:rsa_key [rsa_key] version=INTEGER:0 modulus=INTEGER:187031899687190461 pubExp=INTEGER:65537 privExp=INTEGER:113679732755818241 p=INTEGER:191569309 q=INTEGER:976314529 e1=INTEGER:45208217 e2=INTEGER:554591105 coeff=INTEGER:98718018 With openssl, the configuration above can easily be converted to the DER/PEM format, making the key suitable for most applications. • Access and download the project schematics here. • Access the project code here. • GNAT Community was used in this project, download it here. ]]> Make with Ada 2020: The SmartBase - IoT Adjustable Bed https://blog.adacore.com/make-with-ada-the-smartbase Thu, 11 Jun 2020 11:34:00 +0000 Emma Adby https://blog.adacore.com/make-with-ada-the-smartbase John Singleton's The SmartBase - IoT Adjustable Bed won both the first prize and a finalist prize in the Make with Ada 2019/20 competition. This project was originally posted on Hackster.io here. Story My wife suffers from a rare condition that causes her to be nauseous for many hours quite often. Through trial and error, she has learned that one thing that seems to alleviate the symptoms is sitting upright and perfectly still. I noticed that often, early in the morning, she would slip out of bed and move to the couch where she can sit upright while I slept in bed. So, about a year ago, after too many nights of being uncomfortable with our bed, my wife and I decided to finally go bed shopping for a good mattress. In doing so we realized how common it is now that people purchase adjustable bases with their mattresses. It dawned on us that having an adjustable base would be excellent for her condition as she could sit up in bed whenever she needed and she wouldn’t have to go to the couch to do it. After trying one on a showroom floor we were excited to find that we loved it and we went out and bought one with our new mattress. At first, it was amazing; being able to prop yourself up in bed. But then there were problems. When my wife wanted to raise the bed, if I wasn’t there she’d have to fumble around for the remote (which always seemed to be hard to find). Often, if she wasn’t feeling well, she’d give up and acquiesce to laying supine. Another problem we had was that we tended to fall asleep with the bed in the upright position. This gets quite uncomfortable for the entire night and the effort needed to find the remote and adjust the bed would wake us from our sleep, making it hard to get back to sleep. One night, when faced with this problem it dawned on me: we already use our Alexa to control our lights and other devices, why not our bed? A quick Google search revealed that nothing like this had ever been done. To make it worse, adjustable beds are primitive affairs --- they don’t have APIs and they aren’t programmable. They rely on power control circuits to drive them. This seemed like the perfect type of project to take on. At the time, I had recently completed my Ph.D. in Computer Science where I focused on formal methods and new techniques for specification inference. From this experience I was already familiar with Ada and SPARK, which uses a specification language similar to my advisor’s own JML. Although I knew about Ada through this experience (The Building High Integrity Applications with SPARK book was one of the first I stole from my advisor’s office!) I hadn’t made anything “real” in Ada, so I thought I’d do this also as a way of understanding what the state of Ada/SPARK would be for a real product. Project Goals and Overview The goals of this project were to create an IoT device that was: • Able to control the movement of a wired adjustable bed. • Able to be easily reconfigured to work with other wired adjustable beds. • Able to be controlled by both an Amazon Alexa device as well as the original remote. • Able to sense occupancy in the room and able to produce under bed safety lighting when walking around at night. • Written in Ada/SPARK 2012. • Nicely designed in terms of fit and finish (since I planned on actually using it!). That means that the entire system should use its own custom PCBs and be hosted inside of a custom-designed case that would look great under my bed. To explain the product, I’ve prepared two videos that demonstrate the main features of the SmartBase as well as showcase its physical construction. For the interested reader, you can read the remainder of this document to learn all about the different phases of development for this project. Roughly, the phases of development this project followed were: 1. Reverse engineering, in which I figure out how to control the bed. 2. Prototyping, in which I made some rough perf-board prototypes. 3. PCB design, because no one wants a rats nest of wires under their bed, I designed a series of PCBs to support the function of the SmartBase. 4. Enclosure Design, in which I designed a case for my project. 5. Software Design, which essentially happened at all phases, but I’ve given it its own section here. 6. Verifying Things, in which I do a little bit of verification. 7. Moving to STM32, in which I describe porting the entire project over to a STM32F429ZIT6 and a ESP32 for the WIFI functionality. 8. Bitbanging a WS2812B on a STM32F4 in Ada. Neopixels are only easy to use if you are using an Arduino. Apparently no one had done this one before, so, armed with an oscilloscope and a ARM manual, I figured out how to make that happen. 9. Wrapup, What did I think about using Ada to build the SmartBase? For the complete code of the SmartBase running on the RPIZeroW, please visit: https://github.com/jsinglet/sm... For the complete code of the STM32 version of the SmartBase please see this repository: https://github.com/jsinglet/smartbase-stm -- This repo is essentially a fork of the above repository which targets the STM32. I've published them separately so it's easy to understand which repository represents which product. However, before we jump into all that, here are some pictures of the finished project in action. I hope you enjoy reading about it as much as I enjoyed making it! Phase 1: Reverse Engineering The first problem I had to solve was: how do I control the adjustable base? As I’ve mentioned, I’m a software guy, but I’m dangerous with a multimeter. To figure this out, I simply started by sliding under my bed and looking at how it currently functioned. The configuration of my bed was that of a wired remote, connected with a DIN5-style (big round connector commonly found in MIDI applications). Again, being a software guy, I assumed that perhaps the way the people who built the bed constructed it was to perhaps to implement some sort of protocol over serial. I connected it to my computer, fired up WireShark, but alas, nothing sensible came out of the remote. So of course, my next step was to disassemble the remote. The wired remote, horrifically disassembled, is pictured below. After some experimentation with a multimeter I was able to determine that the circuit was quite simple. Here’s a picture of a whiteboard from around the time I was figuring it all out: The logic for the circuit is actually quite simple. One of the 5 pins functions as the power, which happens to be +30V. The other pins are connected to the power via a switch and fed back into the bed. I tested the idea by taking hookup wire and touching it across the various pins in the configuration I had determined through my schematic analysis and sure enough it worked. With this information, I was ready to construct a rough physical prototype. Phase 2: The Super Rough Prototype So after learning the basics of how the control circuit worked, I wanted to test if this could be controlled via relays so after a few trips to my local electronics hacking store SkyCraft (a really one of a kind place!) and some Amazon purchases I managed to put together a rough perfboard prototype, which sadly I didn’t take a picture of when it was all together – however, I saved the board, which is pictured connected to a off the shelf relay module, below: Phase 3: Moving from Perfboard to PCB One of my goals for this project was to have a tidy little box I could stick under my bed that my wife would tolerate. The early perfboard prototype sat on top of some roughly cut tile with a lot of duct tape. I don't have a picture, but it was a great sight. Horrifying, but a good sight. If I was going to have a permanent fixture, I had to fix the ratsnest problem, starting with the circuit. For this, I decided to start by designing my own custom PCB to control the bed. The topic of how many wrong turns and iterations of the PCB I did could frankly fill many pages. To shorten that, I'll just present my finished schematics, below. Before that, however, let me just say that I used JLCPCB to do all my fabrication, including SMT part placement. I can't say enough good things about this company. They are fast, cost effective, and the boards are flawless. For my PCB layout I used Autodesk Eagle, which was great as well. To gain a better understanding of the hardware required to run the SmartBase, I've prepared the following Block Definition Diagram which details the main systems the SmartBase relies on to provide its functionality. At the top level we have the Relay and Power Control Module and the LED Mainboard Module. These two components roughly outline the two boards that were created for this project. In the following paragraphs we discuss each board separately. First, let's look at the board on the bottom, which was responsible for power and hosting the relays on the board that control the bed. Power is routed in through this board on the bottom through a barrel connector to all of the components within the SmartBase. Other components hosted on this board are: In order to control the relays (safely) with GPIO signals I used a series of FQP30N06L "logic level" MOSFET transistors connected as shown in the above schematic. One cool thing I did was make the configuration of the relays jumpered, so if I ever encountered a different bed with a different control configuration they relays can be easily rerouted. The basic idea is that the jumper positions control which pin gets the power signal when the relay is activated. Next, this is the top board, which hosts the WS2812B LED array as well as the CPU, which is a Raspberry Pi Zero W. The schematic and the layout file for the board is pictured, below: This board hosts two 40 connectors which host both the GPIO cable connector from the bottom board and the header for the Raspberry Pi itself. If you can believe it, I got this board right on the first try just by following manufacturer spec sheets. Phase 4: Enclosure Design To build the enclosure I opted for a very simple design, which features several components: • Two DIN-5 ports on the back. One connects to the bed and the other allows you to reconnect the remote control to the bed if you wish to ever manually control the bed. In reality I never used this feature because the voice control was so good. • One power connector for 5VDC power. • The case actually splits into 3 different bodies. The bottom body, the top body, and the "lens" which I printed out of clear PETG. The rest of the case was printed in black PLA. One challenging aspect of this case design was actually coming up with appropriate screw sizes. I ended up using a combination of M2 and M3 screws. I found a bunch of assorted hex head sizes on amazon and was able to design the case dimensions around those. Phase 5: Software Design In this section we will discuss the design of the Ada software that runs the SmartBase. The software described in this section is the software that drives the version of the SmartBase shown in the demo. Later in the section on STM32 we discuss how these components are handled on a STM32F4-family processor in Ada. However, before we start, a few notes: • SmartBase makes use of tasking. It is in fact mainly composed of 5 core tasks that handle relay control, command interfacing, MQTT command processing, and LED status control. • Because I wanted to be able to verify things (and the run it on metal later), I enabled the Ravenscar profile. • Portions of the application are written in Spark. Notably the components that control the bed are written in Spark with some lightweight specifications around the control sequences. Concept of Operation The SmartBase gets inputs from PIR sensors which will trigger fade on / fade off events. These events are able to be processed along with MQTT events which arrive via a AWS Lambda function connected to the Alexa voice service. These MQTT events then turn into motion of the bed via the relay control subsystem. The following diagram provides a high level summary of how the SmartBase performs its main operations. Tasking The typical way one builds microcontroller applications is via a state machine pattern encoded into the main loop of the program running on the microcontroller. For simple applications this is generally fine but for more complex applications it is common to use the multi-tasking capabilities found in an RTOS such as FreeRTOS. That said, one of the excellent aspects of programming a system like the SmartBase in Ada is Ada's excellent (and I mean excellent) tasking facilities that are built right into the language. Because of this, I opted to use Ada's tasking features to structure my application. If you'd like to learn more about this capability, I suggest you take a look at Fabien's article over on the AdaCore blog, here: There's a mini-RTOS in my language. The SmartBase uses 5 tasks for performing its core operations: 1. The Bed Task, which is responsible for controlling access to the relay control system. 2. The CLI Task, which provides a debugging command-line based interface to the SmartBase. 3. The MQTT Task, which listens for protocol events from Amazon IoT (from spoken voice commands) and talks to the bed task to execute protocol events. 4. The LED Task, which is responsible for providing a structured interface to controlling the LED ring. The LED ring defines states for connecting, connected, fading on, and fading off. 5. The Motion Detector Task, which is really comprised of several tasks and is easily the most complex of the 5. I describe the Motion Detector tasks more detail later in this section. The following diagram details the relationship of these five tasks in more detail. Note that in the diagram I use the method loop to indicate the main loop of the task. One stipulation of Ravenscar is that tasks do not exit and notation calls that restriction out. The design of the system is that all interaction with the LED and Bed components happens strictly through the Commands interface with the exception of the Motion Detector tasks. This interface itself in turn only acts with protected objects. For example, if a command arrives via the command line and the MQTT task at the same time (assuming we have more than one CPU) they will both attempt to process the command through the Commands interface which in turn will ensure the access to the resources is serialized. One interesting task is the LED_Status_Task, which is responsible for processing changes to the LED status ring. There are two problems solved in this component: 1) How to provided serialized access to the underling LED hardware and 2) How to ensure that transitions to different LED states are valid? The first problem is solved through protected objects. The second part of the program is covered in more detail in the next section on verification. Lastly, the most complicated use of tasking is easily in the way the SmartBase does motion detection. As can be seen in the above figure, the MotionDetector package is composed of two Tasks and two protected objects. They function in the following fashion: • Interrupts arrive on the protected object Detector, which very quickly sets an interrupt flag. • The Motion_Detector_Trigger_Task monitors this flag by waiting on the entry to Triggered_Entry. Once the barrier releases, the Motion_Detector_Trigger_Task engages the Timer_Task via the MotionDetector_Control object's Start method. • The Timer_Task is then responsible for changing the status of the LED ring to OFF once the detection is finished. • Re-triggering is handled at the level of the Motion_Detector_Trigger_Task. If the LED hasn't already faded off, more time is simply added to the timeout. That way, if people keep moving the lights remain on. Other Software Details Some other items worth mentioning pertaining to the Pi Zero version of this implementation are the following: • For MQTT I wrote C bindings to the Eclipse Paho library. On the STM32 platform, I use the MQTT AT interface directly built into the ESP32 and program it over UART. • For LED Control on the Pi I used a neopixel library and bound to it in C. On STM32 there is no such type of library for Ada (or anything else, really) so I wrote my own hand-coded bit-banged version of a WS2812B control library. What is nice about my implementation is that it is optimized for my particular use case and uses constant RAM (whereas other implementations use RAM on the order of the number of pixels in the array). You can read about it in the section on STM32. • I did all my work in GPS, but I would really like to get my hands on the Eclipse version of the GNAT tool set and would happily accept any complimentary licenses! Phase 6: Verifying Things One of my goals was to check out the specification-related features of Ada with this project. To that end, I came up with two small verification tasks for my project. 1. First, rather than use a timer, part of the way that the bed is controlled is through the use of tasks and protected bodies. These tasks use a barrier to control when a task should start waiting to see if it should stop moving the bed. One aspect of this protocol is that since other commands may be received while the bed is moving (thus, nullifying the current action), the tasks controlling the timeout have to know that they have been cancelled without having race conditions around the starting and stopping of the bed. I will show a little bit of what I did in the protocol with relation to specification. 2. Second, a critical element of this application is the LED status ring. In this application the LED status ring is used for indicating when the SmartBase is connecting to the internet, disconnected, and when motion is detected. Designing a system that can process all of these states at any time is trickier than it sounds and I discuss the model I used for managing the LED status ring. Specifications on the Bed Controller The first thing I wanted to write some specifications for were the behaviors surrounding the behavior of the stopping and starting of the bed. As I described earlier. To do this I wrote the following specifications, which I will explain after the listing. procedure Do_Stop_At (Pin : in out Pin_Type; Expiration_Slot : Time; Actual_Expiration_Slot : Time) with Contract_Cases => ( -- the slots match. This means -- we will be performing the action -- on the pin we expected. (Actual_Expiration_Slot = Expiration_Slot) => Pin'Old = Pin, -- the slots DON'T match, which means we missed our window (Actual_Expiration_Slot /= Expiration_Slot) => Pin = Pin_None ); procedure Stop_At (Pin : in out Pin_Type; Expiration_Slot : Time); procedure Stop with Global => (Input => (Device.GPIO), Output => (Bed_State_Ghost.Moving)), Pre => True, Post => Bed_State_Ghost.Moving = False; In the first specification, there are two cases. The first case is when the time slot that the timer task used to cancel the task is the one currently executing. In this case, we expect that the pre-state value of the Pin matches the post-state value of the pin, that is, we actually perform the stop on the pin we expected to. In the second case we require that if they are not the same that no pin is used to do anything. This is represented by the expression Pin = Pin_None. The second set of specs are on the Stop and Start methods. These specs simply require that, in the case of Stop, we actually stop the bed, and in the case of Start, we assign a pin when the move was successful. In the above listing, you might note that I am using an explicit Ghost package to hold ghost state. Why am I doing this when Ada/Spark 2012 has this feature built in? This is to get around an incompatibility I was having getting this to run on the Raspberry Pi, which only had an older version of GNAT available (and didn't support ghost fields). I was able to replicate this behavior by just encoding the ghost state into an actual package. Since ghosts are really just syntactic sugar for that it works quite nicely. Design of LED Status Control Ring State Machine Of course, verifying things doesn't always mean you write specifications. There are lots of ways to add more assurance around your design. The next thing I decided to analyze was the LED status ring, the states for which can be seen in the following diagram. In the above diagram we have all of the states that the LED ring can be in. One thing that was important to me was ensuring that, the following properties would hold: • When the LED faded ON, the system would not attempt to fade it back on. It sounds like a simple thing, but I didn't want to create a disco effect. • I wanted to ensure that no matter what state the system was in, if the connection was every lost the system would, as soon as it could, notify the user. This point is subtle. I didn't want the system to interrupt the visual effect of a fade, however I did want the connection sequence to begin as soon as possible. Ensuring this sort of separation was critical to my design. • Once the system was trying to connect, the visual feedback that the system was connecting should not be interrupted, even if motion detection events are happening. In my design, one doesn't have to disable motion detection -- the state machine of the LED ring simply subsumes this logic and makes such interruptions impossible. What's nice about having a model like the one pictured above is that we can simply look at it and see if the desired properties hold. So let's do that one by one. The first property obviously holds. Specifically, the only edges going into Fade On come from None, which is not reachable for any state after Fade on. So far so good. The second property holds because all states in the state machine may make a transition to Connecting after completing their state transition. The third property holds because for all nodes in the graph there are no transitions from the Connecting state except Connected. Phase 7: Moving to the STM32 As I mentioned in the introduction, after completing this project on the Raspberry Pi Zero W platform I immediately began work on a version that could run on bare metal on a STM32 processor. For my development boards I selected: I'm not totally done getting the SmartBase running on the STM32 platform, but here's a quick rundown of what is and isn't done so far: • The Bed Control is done and works perfectly with the CLI interface over a serial console. You can see that demonstrated in the video, above. • The Motion Detection is done and perfectly works with the LED array. • The LED Controls are done, thanks to a driver for the WS2812B I wrote which I describe in detail in Section 8. • The MQTT work is almost done. To do this I had to build a custom ESP32 firmware to enable the MQTT AT command set on my board. This works perfectly through the serial console and I'm able to get it to pull down MQTT messages. I'm currently writing the driver that will send the AT commands over UART to the ESP32. • PCB design hasn't been redone for the ESP32 + STM32 combo. However, I'm really excited about getting these guys onto my PCBs and I've already looked over the reference designs. Phase 8: Bit Banging a WS2812B on a STM32F4 in Ada One of the most interesting parts of this project was when I had to get the LED array working on the STM platform in Ada. The timings on the WS2812B are relatively tight and unlike platforms like Arduino and others where there is a wealth of information available, on the STM32 basically nothing exists. What little does exist could never be used in an Ada program because it is hopelessly coupled to the hardware abstraction layers and drivers provided by ST Microelectronics. Therefore, to do this I had to start from from principles and work my way forward. From the manufacturer specification sheets, the WS2812B implements the following protocol: To trick to producing colors with the WS2812B is to set the rightmost 24 bits of a number to the GRB value you want to set. For example, to set a LED to green, you would send a waveform corresponding to the following number: 0000 0000 1111 11111 0000 0000 0000 0000 (white) (green) (red) (blue) To control multiple pixels you just repeat this for as many pixels as you have, making sure to say within the Treset window, which practically just means you do it as fast as possible. To get reliable timings the codes that drive the WS2812B must be written in assembly with interrupts disabled. As a proof of concept I decided to build up a basic loop to set a single LED to green. To do this I cross referenced the cycle cost for each of the instructions, which can be found on the ARM website As for how many instructions are required, the calculations were performed as follows: The STM32F429ZIT6 operates at 180Mhz, i.e. 180000000 CPS, which means each cycle costs: 0.00555555556 microseconds (5.555555559999999 ns). To achieve 800ns we need 800/5.555555559999999 = 143.9 instructions (i.e., 144) of delay time added to the pipeline. This works out to 144 * 5.55ns = 799.2 ns, which is well within the +/- 150ns needed. Similarly, to achieve 450 ns, we need 450/5.555555559999999 = 81 instructions of delay added. That gives us 81 * 5.55ns = 449.55, which is well within +/- 150ns needed. The trick to implementing this is to work in the time to load and set the bits high (or low) and combine that with a loop that keeps that timing in check. From reading other codes (such as the official neopixel code), one way people have done this is by adding explicitly chains of NOP instructions to the pipeline. This works well on a 16Mhz processor like the Arduino's AVR. It doesn't work as well on a 180Mhz processor. Because I didn't want pages and pages of NOPs pasted into my program, I decided instead to work out the code using a loop structure. The following code works essentially according to the following procedure: 1. Load up a counter that dictates the number of pixels we want to do this for. 2. Load up a number that indicates which GRB value to load. 3. Loop over each bit of the color and, following the convention above, send the appropriate bit by introducing enough cycle delay to get the required timeouts. Initially to create this program I worked it out using the math shown above for a single color value. This worked well for a small proof of concept code, but it was difficult to scale to the entire GRB value (let alone multiple ones). To help me expand the program I used an ARM simulator that could count cycles to ensure the loops I had written were correct. I used the VisUAL simulator for this purpose. Finally, however, I wanted to be absolutely sure that my timings were correct. I actually didn't own an oscilloscope so I finally broke down and ordered one off of Amazon. I ended up getting a Hantek DSO5072P, which is a reasonably good scope for my purposes, if not a fancy one. The image below shows an example cursor session, measuring the delay of one bit. The following code shows the general principle of operation -- if you'd like to see how I implemented it in the actual Ada code, please see the end of this post for the full code. One interesting thing to note here is that most neopixel implementations consume ram that grows with the number of pixels. Because I don't ever want any pixel to be a different color than another one I was able to make a simplification that allows my program to always require constant memory, regardless of the number of pixels being driven. For now I'm just using this library in my own project, however, if others find it useful I'd consider contributing a library that makes STM32 control of neopixels accessible to all. cpsie i init mov r3, #2 ; the number of pixels to do this for ldr r1, =0xAAAAAA ; load bits to be loaded ldr r5, =0x40021418 ; set r5 to the GPIOF register + 18 offset for BSRR ldr r6, =0x2000 ; pin 13 HIGH mask ldr r9, =0x20000000 ; pin 13 LOW mask mov r8, #1 ; we use this to test bits send_pixel cmp r3, #0 ; test if we are done beq done ; if we are out of pixels, finish up mov r4, #23 ; we are going to send 24 bits, prime it here. sub r3, r3, #1 ; decrement this pixel send_bit lsl r2, r8, r4 ; build the mask by shifting over the number of bits we have tst r1, r2 ; check the mask against the bits we are loading. bne send_one ; send a one b send_zero ; otherwise, send a zero ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; send_one str r6, [r5] ; set pin 13 HIGH ;; delay for ~ 800ns mov r0, #36 delay_T1H subs r0, r0, #1 bne delay_T1H ;; end delay str r9, [r5] ; set pin 13 LOW ;; delay for ~ 450ns mov r0, #20 delay_T1L subs r0, r0, #1 bne delay_T1L ;; end delay b bit_sent ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; send_zero str r6, [r5] ; set pin 13 HIGH ;; delay for ~ 400ns mov r0, #17 delay_T0H subs r0, r0, #1 bne delay_T0H ;; end delay str r9, [r5] ; set pin 13 LOW ;; delay for ~ 850ns mov r0, #38 delay_T0L subs r0, r0, #1 bne delay_T0L ;; end delay b bit_sent ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; bit_sent cmp r4, #0 ; was that the last bit? sub r4, r4, #1 ; otherwise, decrement our counter beq send_pixel ; if so, go to the next pixel b send_bit ; and send the next bit done cpie i Again, if you want to see what this all looks like when it makes it back to Ada, check out the Code Section of this project. Wrapup In this article I detailed the creation of a novel IoT device, the SmartBase. I described how it works, and described in detail each phase of its development. So after all of this, what is my opinion of developing this product in Ada? Frankly, my reaction is that I cannot even imagine doing it in another language. Even though I am quite familiar with so-called "exotic" languages like Haskell, the blunt, unapologetic efficiency of Ada, the pickiness of the compiler, the the robust, built-in support for tasking and specification were a pleasure to work with and I'm confident these features helped me to find many errors that would have otherwise potentially caused problems in my system. Some other items, in no particular order are: • Scenarios are brilliant. Using scenarios I was able to make multiple versions of the code for SmartBase for different boards and tie it all together quite easily. • Compared to Java and other languages, the package system of Ada, in general, makes structuring a complex application much better. I love how I can have package initialization, nest objects, and in general encapsulate functionality. I think that many people learn the OO facilities in a language like Java and walk away thinking "this is what objects are all about." The package system reminded me of the one in OCaml, which I also quite like. • I love love love the support for separate complication. When you combine that with the package system, you have a powerful mechanism for build management. • Having a SPARK > Prove All menu in my IDE is a beautiful thing to see. In the future I plan to use Ada on a few more projects, for example, I wanted to make some small, battery powered WIFI LED strips. I can't think of a better language for the job. • Access and download the project schematics here. • Access the project code here. • GNAT Community was used in this project, download it here. ]]> CuBit: A General-Purpose Operating System in SPARK/Ada https://blog.adacore.com/cubit-a-general-purpose-operating-system-in-spark-ada Wed, 10 Jun 2020 10:10:00 +0000 Jon Andrew https://blog.adacore.com/cubit-a-general-purpose-operating-system-in-spark-ada pragma Suppress (Index_Check); pragma Suppress (Range_Check); pragma Suppress (Overflow_Check); ... pragma Restrictions (No_Floating_Point); package Compiler is for Default_Switches ("Ada") use ( ... "-mno-sse" "-mno-sse2" ... ); end Compiler; --------------------------------------------------------------------------- -- Read from a model-specific register (MSR) --------------------------------------------------------------------------- function rdmsr(msraddr : in MSR) return Unsigned_64 is low : Unsigned_32; high : Unsigned_32; begin Asm("rdmsr", Inputs => Unsigned_32'Asm_Input("c", msraddr), Outputs => (Unsigned_32'Asm_Output("=a", low), Unsigned_32'Asm_Output("=d", high)), Volatile => True); return (Shift_Left(Unsigned_64(high), 32) or Unsigned_64(low)); end rdmsr; /* AP starting point */ AP_START = 0x7000; /* kernel load and link locations */ KERNEL_PHYS = 0x00100000; KERNEL_BASE = 0xFFFFFFFF80000000; ... SECTIONS { . = AP_START; .text_ap : AT(AP_START) { stext_ap = .; *(.text_ap_entry) etext_ap = .; } . = KERNEL_PHYS + KERNEL_BASE; KERNEL_START_VIRT = .; .text : AT(KERNEL_PHYS) { stext = .; build/boot.o (.text .text.*) /* need this at the front */ *( EXCLUDE_FILE(build/init.o) .text .text.*) } . = ALIGN(4K); etext = .; .rodata : { srodata = .; *(.rodata .rodata.*) } ... .text_ap : AT(AP_START) { stext_ap = .; *(.text_ap_entry) etext_ap = .; }  --------------------------------------------------------------------------- -- Symbol is a useless type, used to prevent us from forgetting to use -- 'Address when referring to one. --------------------------------------------------------------------------- type Symbol is (USELESS) with Size => System.Word_Size; ... BITS 16 ; we'll link this section down low, since it has to be in first ; 65535 bytes for real mode. section .text_ap_entry ... > readelf -a build/boot_ap.o ... Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 0 0 0 [ 1] .shstrtab STRTAB 0000000000000000 00000320 000000000000009e 0000000000000000 0 0 0 [ 2] .strtab STRTAB 0000000000000000 000003c0 000000000000009d 0000000000000000 0 0 0 [ 3] .symtab SYMTAB 0000000000000000 00000460 0000000000000198 0000000000000018 2 15 8 [ 4] .text PROGBITS 0000000000000000 00000040 0000000000000000 0000000000000000 AX 0 0 16 [ 5] .text_ap_entry PROGBITS 0000000000000000 00000040 000000000000008e 0000000000000000 A 0 0 16 ... ... subtype kernelTextPages is Virtmem.PFN range Virtmem.addrToPFN(Virtmem.K2P(To_Integer(Virtmem.stext'Address))) .. Virtmem.addrToPFN(Virtmem.K2P(To_Integer(Virtmem.etext'Address) - 1)); subtype kernelROPages is Virtmem.PFN range Virtmem.addrToPFN(Virtmem.K2P(To_Integer(Virtmem.srodata'Address))) .. Virtmem.addrToPFN(Virtmem.K2P(To_Integer(Virtmem.erodata'Address) - 1)); subtype kernelRWPages is Virtmem.PFN range Virtmem.addrToPFN(Virtmem.K2P(To_Integer(Virtmem.sdata'Address))) .. Virtmem.addrToPFN(Virtmem.K2P(To_Integer(Virtmem.ebss'Address) - 1)); ...  procedure determineFlagsAndMapFrame(frame : in Virtmem.PFN) is ... begin if frame in kernelTextPages then mapPage(fromPhys, toVirtLinear, Virtmem.PG_KERNELDATARO, kernelP4, ok); if not ok then raise RemapException; end if; mapPage(fromPhys, toVirtKernel, Virtmem.PG_KERNELCODE, kernelP4, ok); ...  procedure unlink(ord : in Order; addr : in Virtmem.PhysAddress) with SPARK_Mode => On, Pre => freeLists(ord).numFreeBlocks > 0, Post => freeLists(ord).numFreeBlocks = freeLists(ord).numFreeBlocks'Old - 1 is block : aliased FreeBlock with Import, Address => To_Address(addr); prevAddr : constant System.Address := block.prevBlock; nextAddr : constant System.Address := block.nextBlock; begin linkNeighbors: declare prevBlock : aliased FreeBlock with Import, Address => prevAddr; nextBlock : aliased FreeBlock with Import, Address => nextAddr; begin prevBlock.nextBlock := nextAddr; nextBlock.prevBlock := prevAddr; end linkNeighbors; -- decrement the free list count when we unlink somebody freeLists(ord).numFreeBlocks := freeLists(ord).numFreeBlocks - 1; end unlink;  type FreeNode is record next : System.Address; prev : System.Address; end record with Size => 16 * 8; for FreeNode use record next at 0 range 0..63; prev at 8 range 0..63; end record; type Slab is limited record freeList : FreeNode; numFree : Integer := 0; capacity : Integer := 0; blockOrder : BuddyAllocator.Order; blockAddr : Virtmem.PhysAddress; mutex : aliased Spinlock.Spinlock; alignment : System.Storage_Elements.Storage_Count; paddedSize : System.Storage_Elements.Storage_Count; initialized : Boolean := False; end record; -- GNAT-specific pragma pragma Simple_Storage_Pool_Type(Slab); ... objSlab : SlabAllocator.Slab; type myObjPtr is access myObject; for myObjPtr'Simple_Storage_Pool use objSlab; procedure free is new Ada.Unchecked_Deallocation(myObject, myObjPtr); obj : myObjPtr; begin SlabAllocator.setup(objSlab, myObject'Size); obj := new Object; ...  --------------------------------------------------------------------------- -- FADT - Fixed ACPI Description Table. --------------------------------------------------------------------------- type FADTRecord is record header : SDTRecordHeader; firmwareControl : Unsigned_32; -- ignored if exFirmwareControl present dsdt : Unsigned_32; -- ignored if exDsdt present reserved1 : Unsigned_8; powerMgmtProfile : PowerManagementProfile; sciInterrupt : Unsigned_16; smiCommand : Unsigned_32; acpiEnable : Unsigned_8; acpiDisable : Unsigned_8; S4BIOSReq : Unsigned_8; pStateControl : Unsigned_8; PM1AEventBlock : Unsigned_32; PM1BEventBlock : Unsigned_32; PM1AControlBlock : Unsigned_32; PM1BControlBlock : Unsigned_32; PM2ControlBlock : Unsigned_32; PMTimerBlock : Unsigned_32; GPE0Block : Unsigned_32; GPE1Block : Unsigned_32; PM1EventLength : Unsigned_8; PM1ControlLength : Unsigned_8; PM2ControlLength : Unsigned_8; PMTimerLength : Unsigned_8; GPE0BlockLength : Unsigned_8; GPE1BlockLength : Unsigned_8; GPE1Base : Unsigned_8; cStateControl : Unsigned_8; pLevel2Latency : Unsigned_16; pLevel3Latency : Unsigned_16; flushSize : Unsigned_16; flushStride : Unsigned_16; dutyOffset : Unsigned_8; dutyWidth : Unsigned_8; dayAlarm : Unsigned_8; monthAlarm : Unsigned_8; century : Unsigned_8; -- RTC index into RTC RAM if not 0 intelBootArch : Unsigned_16; -- IA-PC boot architecture flags reserved2 : Unsigned_8; -- always 0 flags : Unsigned_32; -- fixed feature flags resetRegister : GenericAddressStructure; resetValue : Unsigned_8; armBootArch : Unsigned_16; fadtMinorVersion : Unsigned_8; exFirmwareControl : Unsigned_64; exDsdt : Unsigned_64; exPM1AEventBlock : GenericAddressStructure; exPM1BEventBlock : GenericAddressStructure; exPM1AControlBlock : GenericAddressStructure; exPM1BControlBlock : GenericAddressStructure; exPM2ControlBlock : GenericAddressStructure; exPMTimerBlock : GenericAddressStructure; exGPE0Block : GenericAddressStructure; exGPE1Block : GenericAddressStructure; -- ACPI 6 fields (not supported yet) --sleepControlReg : GenericAddressStructure; --sleepStatusReg : GenericAddressStructure; --hypervisorVendor : Unsigned_64; end record with Size => 244*8; for FADTRecord use record header at 0 range 0..287; firmwareControl at 36 range 0..31; dsdt at 40 range 0..31; reserved1 at 44 range 0..7; powerMgmtProfile at 45 range 0..7; sciInterrupt at 46 range 0..15; smiCommand at 48 range 0..31; acpiEnable at 52 range 0..7; acpiDisable at 53 range 0..7; S4BIOSReq at 54 range 0..7; pStateControl at 55 range 0..7; PM1AEventBlock at 56 range 0..31; PM1BEventBlock at 60 range 0..31; PM1AControlBlock at 64 range 0..31; PM1BControlBlock at 68 range 0..31; PM2ControlBlock at 72 range 0..31; PMTimerBlock at 76 range 0..31; GPE0Block at 80 range 0..31; GPE1Block at 84 range 0..31; PM1EventLength at 88 range 0..7; PM1ControlLength at 89 range 0..7; PM2ControlLength at 90 range 0..7; PMTimerLength at 91 range 0..7; GPE0BlockLength at 92 range 0..7; GPE1BlockLength at 93 range 0..7; GPE1Base at 94 range 0..7; cStateControl at 95 range 0..7; pLevel2Latency at 96 range 0..15; pLevel3Latency at 98 range 0..15; flushSize at 100 range 0..15; flushStride at 102 range 0..15; dutyOffset at 104 range 0..7; dutyWidth at 105 range 0..7; dayAlarm at 106 range 0..7; monthAlarm at 107 range 0..7; century at 108 range 0..7; intelBootArch at 109 range 0..15; reserved2 at 111 range 0..7; flags at 112 range 0..31; resetRegister at 116 range 0..95; resetValue at 128 range 0..7; armBootArch at 129 range 0..15; fadtMinorVersion at 131 range 0..7; exFirmwareControl at 132 range 0..63; exDsdt at 140 range 0..63; exPM1AEventBlock at 148 range 0..95; exPM1BEventBlock at 160 range 0..95; exPM1AControlBlock at 172 range 0..95; exPM1BControlBlock at 184 range 0..95; exPM2ControlBlock at 196 range 0..95; exPMTimerBlock at 208 range 0..95; exGPE0Block at 220 range 0..95; exGPE1Block at 232 range 0..95; -- ACPI 6 fields --sleepControlReg at 244 range 0..95; --sleepStatusReg at 256 range 0..95; --hypervisorVendor at 268 range 0..63; end record; ... setup_bsp: ; Setup our kernel stack. mov rsp, qword (STACK_TOP) ; Add a stack canary to bottom of primary stack for CPU #0 mov rax, qword (STACK_TOP - PER_CPU_STACK_SIZE + SEC_STACK_SIZE) mov rbx, 0xBAD_CA11_D37EC7ED mov [rax], rbx ; Save rdi, rsi so adainit doesn't clobber them push rdi push rsi ; Initialize with adainit for elaboration prior to entering Ada. mov rax, qword adainit call rax ; Restore arguments to kmain pop rsi pop rdi ; call into Ada code mov rax, qword kmain call rax ... ]]> GNAT Community 2020 is here! https://blog.adacore.com/gnat-community-2020-is-here Tue, 26 May 2020 13:59:00 +0000 Nicolas Setton https://blog.adacore.com/gnat-community-2020-is-here We are happy to announce that the GNAT Community 2020 release is now available via https://www.adacore.com/download. Here are some release highlights: GNAT compiler toolchain The 2020 compiler includes tightening and enforcing of Ada rules, performance enhancements, and support for some Ada 202x features - watch this space for further news on this. The compiler back-end has been upgraded to GCC 9 on all platforms except Mac OS - see below for further information about this exception. ASIS is no longer supported, and we encourage you to switch to Libadalang for all your code intelligence needs. GNAT Community 2019 remains available for legacy support of ASIS. RISC-V 64-bits This year we have added a toolchain for RISC-V 64-bits hosted on Linux - you can try it out on boards like the HiFive Unleashed - and we include the emulator for this platform as well. IDE This release includes GNAT Studio, the evolution of GPS, our multi-language IDE for Ada, SPARK, C, C++ and Python. Notable features are: • A Completely new engine for Ada/SPARK navigation, implemented via a language server based on Libadalang. This means, in particular, that navigation works without requiring you to compile the codebase first. • Improved overall performance in the editors, the omnisearch, and the debugger. • Several UI enhancements, especially the contextual menus which have been reorganized. Please also note that we no longer support GNAT Studio on Mac OS. Libadalang Libadalang, a library for parsing and semantic analysis of Ada code, has made a lot of progress in the past year. In this GNAT Community release, you'll find: • A new app framework that allows you to scaffold your Libadalang project - see this blog post for more information. • The Python-facing API is now compatible with Python 3.. • Support of Aggregate Projects has been added. SPARK For those looking to take their Ada programs to the next level, GNAT Community includes a complete SPARK toolchain now including the Lemma library (doc). Toolchain and development environment enhancements are: • New SPARK submenus and key shortcuts in GNAT Studio. • Parallel analysis of subprograms. • Automatic target configuration for GNAT runtimes. Proving engine enhancements are: • Support for infinite precision arithmetic in Ada.Numerics.Big_Numbers.Big_Integers/Big_Reals (doc). • Support for partially initialized data in proof (doc). • Detection of memory leaks by proof. • Dead code detected by proof warnings. • Improved floating-point support in Alt-Ergo prover. SPARK language enhancements are: • Support for local borrowers as part of pointer support through ownership. • Many fixes in the new pointer support based on ownership. • Detection of wrap-around on modular arithmetic with annotation No_Wrap_Around (doc). • Support for forward goto. • Support for raise expressions (doc). • Detection of unsafe use of Unchecked_Conversion (doc). • New annotation Might_Not_Return on procedures (doc). • Volatility refinement aspects supported for types (doc). • Allow SPARK_Mode Off inside subprograms. • Support for volatile variables to prevent compiler optimizations (doc). Support for Visual Studio Code If you are using Visual Studio Code, we have written a prototype extension for Ada and SPARK as part of our work on the Ada Language Server: you can find it on the Visual Studio Marketplace. Notes on Mac OS Mac OS is becoming harder to maintain, especially the latest versions, which require code signing for binaries. For the GNAT Community 2020 release we decided not to codesign and notarize the binaries, so you'll have to circumvent the protections: see the README for the specific instructions. We have also removed support for the ARM cross compiler hosted on this platform, as well as GNAT Studio. ]]> From Ada to Platinum SPARK: A Case Study for Reusable Bounded Stacks https://blog.adacore.com/from-ada-to-platinum-spark-a-case-study-for-reusable-bounded-stacks Thu, 14 May 2020 14:17:00 +0000 Pat Rogers https://blog.adacore.com/from-ada-to-platinum-spark-a-case-study-for-reusable-bounded-stacks 1. Introduction To learn a new programming language, an effective approach is to implement data structures common to computer programming. This is an effective strategy because the problem to be solved is well understood and several different forms of a given data structure are possible: bounded versus unbounded, sequential versus thread-safe, and so on. A clear understanding of the problem allows one to focus on the language details, and the multiple forms likely require a wide range of language features. Fortunately, when learning SPARK, Ada programmers need not start from scratch. We can begin with an existing, production-ready Ada implementation for a common data structure and make the changes necessary to conform to SPARK. This approach is possible because the fundamental design, based on the principles of software engineering, is the same in both languages. We would have a package exporting a private type, with primitive operations manipulating that type; in other words, an abstract data type (ADT). The type might be limited, and might be tagged, using the same criteria in both languages to decide. Those primitive operations that change state would be procedures, with functions designed to be "pure" and side effects avoided. As a result, the changes need not be fundamental or extensive, although they are important and in some cases subtle. The chosen Ada component is one that I have had for decades and have used in real-world applications. Specifically, this component defines a sequential, bounded stack ADT. The enclosing package is a generic so that the type of data contained in the stack objects need not be hard-coded. By "sequential" I mean that the code is not thread-safe. By "bounded" I mean that it is backed by an array, which as usual entails a discriminant on the private type to set the upper bound of the internal array component. Client misuse of the Push and Pop routines, e.g., pushing onto a full stack, raises exceptions. As Ada has evolved I have applied new features to make the code more robust, for example the Push and Pop routines use preconditions to prevent callers from misusing the abstraction, raising exceptions from within the preconditions instead of the procedure bodies. This blog entry describes the transformation of that Ada stack ADT into a completely proven SPARK implementation that relies on static verification instead of run-time enforcement of the abstraction’s semantics. We will prove that there are no reads of unassigned variables, no array indexing errors, no range errors, no numeric overflow errors, no attempts to push onto a full stack, no attempts to pop from an empty stack, that subprogram bodies implement their functional requirements, and so on. As a result, we get a maximally robust implementation of a reusable stack abstraction providing all the facilities required for production use. The transformation will occur in phases, following the adoption levels described in section 2. Each adoption level introduces more rigor and thus defines a simple, incremental transition approach. Note that I assume familiarity with Ada, including preconditions and postconditions. Language details can be obtained from the online learning facilities available at https://learn.adacore.com/, an interactive site allowing one to enter, compile, and execute Ada programs in a web browser. We also assume a degree of familiarity with SPARK. That same web site provides a similar interactive environment and materials for learning SPARK, including formal proof. 2. SPARK Adoption Levels In 2016, AdaCore collaborated with Thales in a series of experiments on the application of SPARK to existing software projects written in Ada. The resulting document presents a set of guidelines for adopting formal verification in existing projects. These guidelines are arranged in terms of five levels of software assurance, in increasing order of benefits and costs. The levels are named Stone, Bronze, Silver, Gold and Platinum. Successfully reaching a given level requires successfully achieving the goals of the previous levels as well. The guidelines were developed jointly by AdaCore and Thales for the adoption of the SPARK language technology at Thales but are applicable across a wide range of application domains. The document is available online: http://www.adacore.com/knowled... 2.1 Stone Level The goal at the Stone level is to identify as much code as possible that belongs to the SPARK subset. That subset provides a strong semantic coding standard that enforces safer use of Ada language features and forbids those features precluding analysis (e.g., exception handlers). The result is potentially more understandable, maintainable code. 2.2 Bronze Level The goal at the Bronze level is to verify initialization and correct data flow, as indicated by the absence of GNATprove messages during SPARK flow analysis. Flow analysis detects programming errors such as reading uninitialized data, problematic aliasing between formal parameters, and data races between concurrent tasks. In addition, GNATprove checks unit specifications for the actual data read or written, and the flow of information from inputs to outputs. As one can see, this level provides significant benefits, and can be reached with comparatively low cost. There are no proofs attempted at this level, only data and flow analyses. 2.3 Silver Level The goal at the Silver level is to statically prove absence of run-time errors (AoRTE), i.e., that there are no exceptions raised. Proof at this level detects programming errors such as divide by zero, array indexes that are out of bounds, and numeric overflow (integer, fixed-point and floating-point), among others. These errors are detected via the implicit language-defined checks that raise language-defined exceptions. The checks themselves preclude a number of significant situations, including, for example, buffer overflow, which is often exploited to inject malicious executable code. Preconditions, among other additions, may be required to prove these checks. To illustrate the benefit and part of the cost of achieving the Silver level, consider the way the Ada version of the stack ADT uses preconditions for this purpose. (The complete Ada implementation is explored in section 4.1.) First, here is the full declaration for type Stack in the Ada package private part: type Content is array (Positive range <>) of Element; type Stack (Capacity : Positive) is record Values : Content (1 .. Capacity); Top : Natural := 0; end record; The type Element represents the kind of individual values contained by stack objects. Top is used as the index into the array Values and can be zero. The Values array uses 1 for the lower index bound so when Top is zero the enclosing stack object is logically empty. The following function checks for that condition: function Empty (This : Stack) return Boolean is (This.Top = 0); Consider, then, a function using Empty as a precondition. The function takes a stack parameter as input and returns the Element value at the logical top of the stack: 19 function Top_Element (This : Stack) return Element with 20 Pre => not Empty (This); Given the precondition on line 20, within the function completion we know that Top has a value that is a potentially valid array index. (We'll also have to be more precise about Top's upper bound, as explained later in section 4.4.) There is no need for defensive code so the body is simply as follows: 57 function Top_Element (This : Stack) return Element is 58 (This.Values (This.Top)); If we did not have the precondition specified, GNATprove would issue a message: 58:24: medium: array index check might fail, (e.g. when This = (…, Top => 0) and …)  The message shows an example situation in which the check could fail: Top is zero, i.e., the stack is empty. (We have elided some of the message content to highlight the part mentioning Top.) GNATprove will attempt to prove, statically, that the preconditions hold at every call site, flagging those calls, if any, in which the preconditions might not hold. Those failures must be addressed at the Silver level because the preconditions are necessary to the proof of absence of run-time errors. As you can see, the Silver level provides highly significant benefits, but does require more contracts and potentially complex changes to the code. The effort required to achieve this level can be high. Arguably, however, this level should be the minimum target level, especially if the application executable is to be deployed with run-time checks disabled. 2.4 Gold Level The goal at the Gold level is proof of key integrity properties. These properties are typically derived from software requirements but also include maintaining critical data invariants throughout execution. Working at this level assumes prior completion at the Silver level to ensure program integrity, such that control flow cannot be circumvented through run-time errors and data cannot be corrupted. Verification at this level is also expected to pass without any violations. Key integrity properties are expressed as additional preconditions and postconditions beyond those used for defensive purposes. In addition, the application may explicitly raise application-defined exceptions to signal violations of integrity properties. GNATprove will attempt to prove that the code raising an exception is never reached, and thus, that the property violation never occurs. This approach may also require further proof-oriented code. The Gold level provides extremely significant benefits. In particular, it can be less expensive to prove at this level than to test to the same degree of confidence. However, the analysis may take a long time, may require adding more precise types (ranges), and may require adding more preconditions and postconditions. Even if a property is provable, automatic provers may fail to prove it due to limitations of the provers, requiring either manual proof or, alternatively, testing. 2.5 Platinum Level The goal at the Platinum level is nothing less than full functional proof of the requirements, including the functional unit level requirements, but also any abstract requirements such as, for example, safety and security. As with the Gold level, the application code must pass SPARK analysis without any violations. Furthermore, at the Platinum level GNATprove must verify complete user specifications for type invariants, preconditions, postconditions, type predicates, loop variants, and loop termination. The effort to achieve Platinum level is high, so high that this level is not recommended during initial adoption of SPARK. 3. Development Environment and Configuration When we say we use SPARK, we mean that we develop the sources in the SPARK language, but also that we use the SPARK analysis tool to examine and verify those sources. We developed our sources in GNAT Studio (formerly GPS), a multi-lingual IDE supporting both Ada and SPARK, among others. The SPARK analysis tool is named GNATprove, a command-line tool integrated with GNAT Studio. GNAT Studio facilitates invocation of GNATprove with control over switches and source files, providing traversable results and even, if need be, interactive proof. 3.1 The Provers A critical concept for using GNATprove is that it transparently invokes third-party “provers” to analyze the given source files. These provers are somewhat specialized in their ability to analyze specific semantics expressed by the source code. As a result, invocation of a series of provers may be required before some source code is successfully proven. In addition, we may need to ask the provers to “try harder” when attempting to analyze difficult situations. GNATprove can do both for us via the “level=n” switch, where “n” is a number from 0 to 4 indicating increasing strength of analysis and additional provers invoked. In proving our stack implementation we use level 4. 3.2 Language-Defined Run-time Checks GNATprove is also integrated with the GNAT Ada compiler, including the analysis of language-defined run-time checks produced by the compiler. GNATprove attempts to verify that no exceptions are raised due to these checks. It will do so even if we suppress the checks with compiler switches or pragma Suppress, so we can interpret lack of corresponding messages as successful verification of those checks. Integer overflow checks are a special case, and as a result have a dedicated GNAT switch that affects whether that specific check is generated by the compiler. They are a special case because, in addition to the functional code, they may appear in the logical assertions about the functional code, including subprogram preconditions and postconditions. In these contexts, we might expect them to behave mathematically, without implementation bounds. For example, consider the following declaration for a procedure that enters a log entry into a file: 5 Entry_Num : Natural := 0; 6 7 procedure Log (This : String) with 8 Pre => Entry_Num + 1 <= Integer'Last, 9 Global => (In_Out => Entry_Num); The procedure body increments Entry_Num by one and then prepends the result to the string passed as the log entry. This addition in the body might overflow, but the issue under consideration is the addition in the precondition on line 8. If Entry_Num is Integer’Last at the point of the call, the addition on line 8 will overflow, as GNATprove indicates: 8:26: medium: overflow check might fail (e.g. when Entry_Num = Natural'Last)  We could revise the code so that the expression cannot overflow: Pre => Entry_Num <= Integer'Last - 1, although that is slightly less readable. Other alternatives within the code are possible as well. However, with regard to switches pertinent for check generation, GNAT provides the “-gnato” switch that allows us to control how integer overflow is treated. (There is a pragma as well, with the same effects.) We can use that switch to have the compiler implement integer arithmetic mathematically, without bounds, the way we might conceptually expect it to work within logical, non-functional assertions. As a result, there will be no integer overflow checks generated. The default effect for the switch, and the default if the switch is not present, is to enable overflow checks in both functional and assertion code so we just need to be aware of non-default usage when we want to determine whether integer overflow checks have been verified. (See the SPARK User Guide, section 5.7 “Overflow Modes” for the switch parameters.) In our GNAT project file, the switch is explicitly set to enable overflow checks in both the functional code and the assertion code. 3.3 Source Code File Organization The main program declares objects of a type Stack able to contain character values. That Stack type is provided by the package Character_Stacks, which is an instantiation of a generic package defining a stack abstract data type. The instantiation is specified such that objects of the resulting Stack type can contain character values. Logically, there are four source files in the application: two (declaration and body) for the generic package, one for the instantiation of that generic package, and one containing the demonstration main subprogram. Operationally, however, there are multiple source files for the generic package. Rather than have one implementation that we alter as we progress through the SPARK adoption levels, we have chosen to have a distinct generic package for each level. Each generic package implements a common stack ADT in a manner consistent with an adoption level. The differences among them reflect the changes required for the different levels. This approach makes it easier to keep the differences straight when examining the code. Furthermore, we can apply the proof analyses to a conceptually common abstraction at arbitrary adoption levels without having to alter the code. In addition to the content differences required by the adoption levels, each generic package name reflects the corresponding level. We have generic package Bounded_Stacks_Stone for the Stone level, Bounded_Stacks_Gold for the Gold level, and so on. Therefore, although the instantiation is always named Character_Stacks, we have multiple generic packages available to declare the one instantiation used. There are also multiple files for the instantiations. Each instantiation is located within a dedicated source file corresponding to a given adoption level (lines 2 and 3 below). For example, here is the content of the file providing the instance for the Stone level: 1 pragma Spark_Mode (On); 2 with Bounded_Stacks_Stone; 3 package Character_Stacks is new Bounded_Stacks_Stone 4 (…); The file names for these instances must be unique but are otherwise arbitrary. For the above, the file name is “character_stacks-stone.ads” because it is the instance of the Stone level generic. Only one of these instances can be used when GNATprove analyzes the code (or when building the executable). To select among them we use a “scenario variable” defined in the GNAT project file that has scenario values matching the adoption level names. In the IDE this scenario variable is presented with a pull-down menu so all we must do to work at a given level is select the adoption level name in the pull-down list. The project file then selects the instantiation file corresponding to the level, e.g., “character_stacks-silver.ads” when the Silver level is selected. There are also multiple source files for the main program. Rather than have one file that must be edited as we prove the higher levels, we have two: one for all levels up to and including the Silver level, and one for all levels above that. The scenario variable also determines which of these two source files is active. 3.4 Verifying Generic Units One of the current limitations of GNATprove is that it cannot verify generic units on their own. GNATprove must instead be provided an instantiation to verify. Therefore, whenever we say that we are verifying the generic package defining the stack ADT, we mean we are invoking GNATprove on an instantiation of that generic. As noted earlier in section 3.3, there are multiple source files containing these instantiations so we must select the file corresponding to the desired level when we want to verify the generic package alone. However, because there are only four total files required at any one time, we usually invoke the IDE action that has GNATprove analyze all the files in the closure of the application. The instantiation file corresponding to the scenario variable’s current selection will be analyzed; other instantiation files are ignored. This approach also verifies the main program’s calls to the stack routines, which is vital to the higher adoption levels. 4. Implementations Per Adoption Level Our first main procedure, used for all adoption levels up through Silver, declares two stack objects (line 6 below) and manipulates them via the abstraction’s interface: 1 with Ada.Text_IO; use Ada.Text_IO; 2 with Character_Stacks; use Character_Stacks; 3 4 procedure Demo_AoRTE with SPARK_Mode is 5 6 S1, S2 : Stack (Capacity => 10); -- arbitrary 7 8 X, Y : Character; 9 10 begin 11 pragma Assert (Empty (S1) and Empty (S2)); 12 pragma Assert (S1 = S2); 13 Push (S1, 'a'); 14 Push (S1, 'b'); 15 Put_Line ("Top of S1 is '" & Top_Element (S1) & "'"); 16 17 Pop (S1, X); 18 Put_Line ("Top of S1 is '" & Top_Element (S1) & "'"); 19 Pop (S1, Y); 20 pragma Assert (Empty (S1) and Empty (S2)); 21 Put_Line (X & Y); 22 23 Reset (S1); 24 Put_Line ("Extent of S1 is" & Extent (S1)'Image); 25 26 Put_Line ("Done"); 27 end Demo_AoRTE; This is the “demo_aorte.adb” file. The purpose of the code is to illustrate issues found at the initial levels, including proof in a caller context. It has no other functional purpose whatsoever. As we progress through the levels, we will add more assertions to highlight more issues, as will be seen in the other main procedure in the “demo_gold.adb” file. 4.1 Initial Ada Implementation The initial version defines a canonical representation of a sequential, bounded stack. As an abstract data type, the Stack type is declared as a private type with routines manipulating objects of the type. The type is declared within a generic package that has one generic formal parameter, a type representing the kind of elements contained by Stack objects. This approach is used in all the implementations. Some routines have “defensive” preconditions to ensure correct functionality. They raise exceptions, declared within the package, when the preconditions do not hold. The generic package in Ada is declared as follows: 1 generic 2 type Element is private; 3 package Bounded_Stacks_Magma is 4 5 type Stack (Capacity : Positive) is private; 6 7 procedure Push (This : in out Stack; Item : in Element) with 8 Pre => not Full (This) or else raise Overflow; 9 10 procedure Pop (This : in out Stack; Item : out Element) with 11 Pre => not Empty (This) or else raise Underflow; 12 13 function Top_Element (This : Stack) return Element with 14 Pre => not Empty (This) or else raise Underflow; 15 -- Returns the value of the Element at the "top" of This 16 -- stack, i.e., the most recent Element pushed. Does not 17 -- remove that Element or alter the state of This stack 18 -- in any way. 19 20 overriding function "=" (Left, Right : Stack) return Boolean; 21 22 procedure Copy (Destination : out Stack; Source : Stack) with 23 Pre => Destination.Capacity >= Extent (Source) 24 or else raise Overflow; 25 -- An alternative to predefined assignment that does not 26 -- copy all the values unless necessary. It only copies 27 -- the part "logically" contained, so is more efficient 28 -- when Source is not full. 29 30 function Extent (This : Stack) return Natural; 31 -- Returns the number of Element values currently 32 -- contained within This stack. 33 34 function Empty (This : Stack) return Boolean; 35 36 function Full (This : Stack) return Boolean; 37 38 procedure Reset (This : out Stack); 39 40 Overflow : exception; 41 Underflow : exception; 42 43 private 44 45 type Content is array (Positive range <>) of Element; 46 47 type Stack (Capacity : Positive) is record 48 Values : Content (1 .. Capacity); 49 Top : Natural := 0; 50 end record; 51 52 end Bounded_Stacks_Magma; This version is below the Stone level because it is not within the SPARK subset, due to the raise expressions on lines 8, 11, 14, and 24. We will address those constructs in the Stone version. The generic package body is shown below. 1 package body Bounded_Stacks_Magma is 2 3 procedure Reset (This : out Stack) is 4 begin 5 This.Top := 0; 6 end Reset; 7 8 function Extent (This : Stack) return Natural is 9 (This.Top); 10 11 function Empty (This : Stack) return Boolean is 12 (This.Top = 0); 13 14 function Full (This : Stack) return Boolean is 15 (This.Top = This.Capacity); 16 17 procedure Push (This : in out Stack; Item : in Element) is 18 begin 19 This.Top := This.Top + 1; 20 This.Values (This.Top) := Item; 21 end Push; 22 23 procedure Pop (This : in out Stack; Item : out Element) is 24 begin 25 Item := This.Values (This.Top); 26 This.Top := This.Top - 1; 27 end Pop; 28 29 function Top_Element (This : Stack) return Element is 30 (This.Values (This.Top)); 31 32 function "=" (Left, Right : Stack) return Boolean is 33 (Left.Top = Right.Top and then 34 Left.Values (1 .. Left.Top) = Right.Values (1 .. Right.Top)); 35 36 procedure Copy (Destination : out Stack; Source : Stack) is 37 subtype Contained is Integer range 1 .. Source.Top; 38 begin 39 Destination.Top := Source.Top; 40 Destination.Values (Contained) := Source.Values (Contained); 41 end Copy; 42 43 end Bounded_Stacks_Magma; Note that both procedure Copy and function “=” are defined for the sake of increased efficiency when the objects in question are not full. The procedure only copies the slice of Source.Values that represents the Element values logically contained at the time of the call. The language-defined assignment operation, in contrast, would copy the entire contents. Similarly, the overridden equality operator only compares the array slices, rather than the entire arrays, after first ensuring the stacks are the same logical size. However, in addition to efficiency, the "=" function is also required for proper semantics. The comparison should not compare array elements that are not, and perhaps never have been, currently contained in the stack objects. The predefined equality would do so and must, therefore, be replaced. The changes to the body made for the sake of SPARK will amount to moving certain bodies to the package declaration so we will not show the package body again. The full Platinum implementation, both declaration and body, is provided in section 6. 4.2 Stone Implementation The Stone level version of the package cannot have the "raise expressions" in the preconditions because they are not in the SPARK subset. The rest of the preconditions are unchanged. Here are the updated declarations for Push and Pop, for example:  procedure Push (This : in out Stack; Item : in Element) with Pre => not Full (This); procedure Pop (This : in out Stack; Item : out Element) with Pre => not Empty (This); When we get to the adoption levels involving proof, GNATprove will attempt to verify statically that the preconditions will hold at each call site. Either that verification will succeed, or we will know that we must change the calling code accordingly. Therefore, the prohibited “raise expressions” are not needed. The exception declarations, although within the subset, are also removed because they are no longer needed. The remaining code is wholly within the SPARK subset so we have reached the Stone level. 4.3 Bronze Implementation The Bronze level is about initialization and data flow. When we apply GNATprove to the Stone version in flow analysis mode, GNATprove issues messages on the declarations of procedures Copy and Reset in the generic package declaration: medium: "Destination.Values" might not be initialized in "Copy" high: "This.Values" is not initialized in "Reset"  The procedure declarations are repeated below for reference:  procedure Copy (Destination : out Stack; Source : Stack) with Pre => Destination.Capacity >= Extent (Source); procedure Reset (This : out Stack); Both messages result from the fact that the updated formal stack parameters have mode “out” specified. That mode, in SPARK, means more than it does in Ada. It indicates that the actual parameters are fully assigned by the procedures, but these two procedure bodies do not do so. Procedure Reset simply sets the Top to zero because that is all that a stack requires, at run-time, to be fully reset. It does nothing at all to the Values array component. Likewise, procedure Copy may only assign part of the array, i.e., just those array components that are logically part of the Source object. (Of course, if Source is full, the entire array is copied.) In both subprograms our notion of being fully assigned is less than SPARK requires. Therefore, we have two choices. Either we assign values to all components of the record, or we change the modes to “in out.” These two procedures exist for the sake of efficiency, i.e., not writing any more data than logically necessary. Having Reset assign anything to the array component would defeat the purpose. For the same reason, having Copy assign more than the partial slice (when the stack is not full) is clearly inappropriate. Therefore, we change the mode to “in out” for these two subprograms. In other cases we might change the implementations to fully assign the objects. The other change required for initialization concerns the type Stack itself. In the main subprogram, GNATprove complains that the two objects of type Stack have not been initialized: warning: "S1" may be referenced before it has a value high: private part of "S1" is not initialized warning: "S2" may be referenced before it has a value high: private part of "S2" is not initialized high: private part of "S1" is not initialized  Our full definition of the Stack type in the private part is such that default initialization (i.e., elaboration of object declarations without an explicit initial value) will assign the record components so that a stack will behave as if initially empty. Specifically, default initialization assigns zero to Top (line 5 below), and since function Empty examines only the Top component, such objects are empty. 1 type Content is array (Positive range <>) of Element; 2 3 type Stack (Capacity : Positive) is record 4 Values : Content (1 .. Capacity); 5 Top : Natural := 0; 6 end record; Proper run-time functionality of the Stack ADT does not require the Values array component to be assigned by default initialization. But just as with Reset and Copy, although this approach is sufficient at run-time, the resulting objects will not be fully initialized in SPARK, which analyzes the code prior to run-time. As a result, we need to assign an array aggregate to the Values component as well. Expressing the array aggregate is problematic because the array component type is the generic formal private type Element, with a private view within the package. Inside the generic package we don’t know how to construct a value of type Element so we cannot construct an aggregate containing such values. Therefore, we add the Default_Value generic formal object parameter and use it to initialize the array components. This new generic formal parameter, shown below on line 5, is added from the Bronze version onward: 1 generic 2 type Element is private; 3 -- The type of values contained by objects of type Stack 4 5 Default_Value : Element; 6 -- The default value used for stack contents. Never 7 -- acquired as a value from the API, but required for 8 -- initialization in SPARK. 9 package Bounded_Stacks_Bronze is The full definition for type Stack then uses that parameter to initialize Values (line 2): 1 type Stack (Capacity : Positive) is record 2 Values : Content (1 .. Capacity) := (others => Default_Value); 3 Top : Natural := 0; 4 end record; With those changes in place flow analysis completes without further complaint. The implementation has reached the Bronze level. The need for that additional generic formal parameter is unfortunate because it becomes part of the user’s interface without any functional use. None of the API routines ever return it as such, and the actual value chosen is immaterial. Note that SPARK will not allow the aggregate to contain default components (line 2): 1 type Stack (Capacity : Positive) is record 2 Values : Content (1 .. Capacity) := (others => <>); 3 Top : Natural := 0; 4 end record; as per SPARK RM 4.3(1). Alternatively, we could omit this generic formal object parameter if we use an aspect to promise that the objects are initially empty, and then manually justify any resulting messages. We will in fact add that aspect for other reasons, but we prefer to have proof as automated as possible, for convenience and to avoid human error. Finally, although the data dependency contracts, i.e., the “Global” aspects, would be generated automatically, we add them explicitly, indicating that there are no intended accesses to any global objects. For example, on line 3 in the following: 1 procedure Push (This : in out Stack; Item : Element) with 2 Pre => not Full (This), 3 Global => null; We do so because mismatches between reality and the generated contracts are not reported by GNATprove, but we prefer positive confirmation for our understanding of the dependencies. The flow dependency contracts (the “Depends” aspects) also can be generated automatically. Unlike the data dependency contracts, however, usually these can be omitted from the code even though mismatches with the corresponding bodies are not reported. That lack of notification is not a problem because the generated contracts are safe: they express at least the dependencies that the code actually exhibits. Therefore, all actual dependencies are covered. For example, a generated flow dependency will state that all outputs depend on all inputs, which is possible but not necessarily the case. However, overly conservative contracts can lead to otherwise-avoidable issues with proof, leading the developer to add precise contracts explicitly when necessary. The other reason to express them explicitly is when we want to prove data flow dependencies as part of the abstract properties, for example data flowing only between units at appropriate security levels. We are not doing so in this case. 4.4 Silver Implementation If we try to prove the Bronze level version of the generic package, GNATprove will complain about various run-time checks that cannot be proved in the generic package body. The Silver level requires these checks to be proven not to fail, i.e., not to raise exceptions. The check messages are as follows, preceded by the code fragments they reference, with some message content elided in order to emphasize parts that lead us to the solution: 37 procedure Push (This : in out Stack; Item : in Element) is 38 begin 39 This.Top := This.Top + 1; 40 This.Values (This.Top) := Item; 41 end Push; bounded_stacks_silver.adb:39:28: medium: overflow check might fail, … (e.g. when This = (…, Top => Natural'Last) … bounded_stacks_silver.adb:40:24: medium: array index check might fail, … (e.g. when This = (…, Top => 2) and This.Values'First = 1 and This.Values'Last = 1) 47 procedure Pop (This : in out Stack; Item : out Element) is 48 begin 49 Item := This.Values (This.Top); 50 This.Top := This.Top - 1; 51 end Pop; bounded_stacks_silver.adb:49:32: medium: array index check might fail, … (e.g. when This = (…, Top => 2) and This.Values'First = 1 and This.Values'Last = 1) 57 function Top_Element (This : Stack) return Element is 58 (This.Values (This.Top)); bounded_stacks_silver.adb:58:24: medium: array index check might fail, … (e.g. when This = (…, Top => 2) and This.Values'First = 1 and This.Values'Last = 1) 64 function "=" (Left, Right : Stack) return Boolean is 65 (Left.Top = Right.Top and then 66 Left.Values (1 .. Left.Top) = Right.Values (1 .. Right.Top)); bounded_stacks_silver.adb:66:12: medium: range check might fail, … (e.g. when Left = (Capacity => 1, …, Top => 2) … bounded_stacks_silver.adb:66:43: medium: range check might fail, … (e.g. when Right = (Capacity => 1, …, Top => 2) … 72 procedure Copy (Destination : in out Stack; Source : Stack) is 73 subtype Contained is Integer range 1 .. Source.Top; 74 begin 75 Destination.Top := Source.Top; 76 Destination.Values (Contained) := Source.Values (Contained); 77 end Copy; bounded_stacks_silver.adb:76:47: medium: range check might fail, … (e.g. when Destination = (Capacity => 1, …) and Source = (Capacity => 1, …), Top => 2)  All of these messages indicate that the provers do not know that the Top component is always in the range 0 .. Capacity. The code has not said so, and indeed, there is no way to use a discriminant in a scalar record component declaration to constrain the component’s range. This is what we would write for the record type implementing type Stack in the full view, if we could (line 3): 1 type Stack (Capacity : Positive) is record 2 Values : Content (1 .. Capacity) := (others => Default_Value); 3 Top : Natural range 0 .. Capacity := 0; 4 end record; but that range constraint on Top is not legal. The reason it is illegal is that the application can change the value of a discriminant at run-time, under controlled circumstances, but there is no way at run-time to change the range checks in the object code generated by the compiler. However, with Ada and SPARK there is now a way to express the constraint on Top, and the provers will recognize the meaning during analysis. Specifically, we apply a “subtype predicate” to the record type declaration (line 5): 1 type Stack (Capacity : Positive) is record 2 Values : Content (1 .. Capacity) := (others => Default_Value); 3 Top : Natural := 0; 4 end record with 5 Predicate => Top in 0 .. Capacity; This aspect informs the provers that the Top component for any object of type Stack is always in the range 0 .. Capacity. That addition successfully addresses all the messages about the generic package body. Note that the provers will verify the predicate too. However, GNATprove also complains about the main program. Consider that the first two assertions in the main procedure are not verified: 10 begin 11 pragma Assert (Empty (S1) and Empty (S2)); 12 pragma Assert (S1 = S2); GNATprove emits: 11:19: medium: assertion might fail, cannot prove Empty (S1) 12:19: medium: assertion might fail, cannot prove S1 = S2 We can address the issue for function Empty, partly, by adding another aspect to the declaration of type Stack, this time to the visible declaration:  type Stack (Capacity : Positive) is private with Default_Initial_Condition => Empty (Stack); The new aspect indicates that default initialization results in stack objects that are empty, making explicit, and especially, verifiable, the intended initial object state. We will be notified if GNATprove determines that the aspect does not hold. That new aspect will handle the first assertion in the main program on line 11 but GNATprove complains throughout the main procedure that the preconditions involving Empty and Full cannot be proven. For example: 13 Push (S1, 'a'); 14 Push (S1, 'b'); 15 Put_Line ("Top of S1 is '" & Top_Element (S1) & "'"); GNATprove emits: 13:06: medium: precondition might fail, cannot prove not Full (This) 14:06: medium: precondition might fail, cannot prove not Full (This) [possible explanation: call at line 13 should mention This (for argument S1) in a postcondition] 15:35: medium: precondition might fail, cannot prove not Empty (This) [possible explanation: call at line 14 should mention This (for argument S1) in a postcondition] Note the “possible explanations” that GNATprove gives us. These are clear indications that we are not specifying sufficient postconditions. Remember that when analyzing code that includes a call to some procedure, the provers’ knowledge of the call’s effect is provided entirely by the procedure’s postcondition. That postcondition might be insufficient, especially if it is absent! Therefore, we must tell the provers about the effects of calling Push and Pop, as well as the other routines that change state. We add a new postcondition on Push (line 3): 1 procedure Push (This : in out Stack; Item : Element) with 2 Pre => not Full (This), 3 Post => Extent (This) = Extent (This)'Old + 1, 4 Global => null; The new postcondition expresses the fact that the Stack contains one more Element value after the call. This is sufficient because the provers know that function Extent is simply the value of Top:  function Extent (This : Stack) return Natural is (This.Top); Hence the provers know that Top is incremented by Push. The same approach addresses the messages for Pop (line 3): 1 procedure Pop (This : in out Stack; Item : out Element) with 2 Pre => not Empty (This), 3 Post => Extent (This) = Extent (This)'Old - 1, 4 Global => null; In the above we say that the provers know what the function Extent means. For that to be the case when verifying client calls, we must move the function completion from the generic package body to the generic package declaration. In addition, the function must be implemented as an “expression function,” which Extent already is (see above). As expression functions in the package spec, the provers will know the semantics of those functions automatically, as if each is given a postcondition restating the corresponding expression explicitly. We also need functions Full and Empty to be known in this manner. Therefore, we move the Extent, Empty, and Full function completions, already expression functions, from the generic package body to the package declaration. We put them in the private part because these implementation details should not be exported to clients. However, we have a potential overflow in the postcondition for Push, i.e., the increment of the number of elements contained after Push returns (line 3 below). The postcondition for procedure Pop, of course, does not have that problem. 1 procedure Push (This : in out Stack; Item : Element) with 2 Pre => not Full (This), 3 Post => Extent (This) = Extent (This)'Old + 1, 4 Global => null; The increment might overflow because Extent returns a value of subtype Natural, which could be the value Integer'Last. Hence the increment could raise Constraint_Error and the check cannot be verified. We must either apply the “-gnato” switch so that assertions can never overflow, or alternatively, declare a safe subrange so that the result of the addition cannot be greater than Integer'Last. Our choice is to declare a safe subrange because the effects are explicit in the code, as opposed to an external switch. Here are the added subtype declarations:  subtype Element_Count is Integer range 0 .. Integer'Last - 1; -- The number of Element values currently contained -- within any given stack. The lower bound is zero -- because a stack can be empty. We limit the upper -- bound (minimally) to preclude overflow issues. subtype Physical_Capacity is Element_Count range 1 .. Element_Count'Last; -- The range of values that any given stack object can -- specify (via the discriminant) for the number of -- Element values the object can physically contain. -- Must be at least one. We use the second subtype for the discriminant in the partial view for Stack (line 1): 1 type Stack (Capacity : Physical_Capacity) is private 2 with Default_Initial_Condition => Empty (Stack); and both subtypes in the full declaration in the private part (lines 1, 3, and 5): 1 type Content is array (Physical_Capacity range <>) of Element; 2 3 type Stack (Capacity : Physical_Capacity) is record 4 Values : Content (1 .. Capacity) := (others => Default_Value); 5 Top : Element_Count := 0; 6 end record with 7 Predicate => Top in 0 .. Capacity; The function Extent is changed to return a value of the subtype Element_Count so adding one in the postcondition cannot go past Integer’Last. Overflow is precluded but note that there will now be range checks for GNATprove to verify. With these changes in place we have achieved the Silver level. There are no run-time check verification failures and the defensive preconditions are proven at their call sites. 4.5 Gold Implementation We will now address the remaining changes needed to reach the Gold level. The process involves iteratively attempting to prove the main program that calls the stack routines and makes assertions about the conditions that follow. This process will result in changes to the generic package, especially postconditions, so it will require verification along with the main procedure. Those additional postconditions may require additional preconditions as well. In general, a good way to identify postcondition candidates is to ask ourselves what conditions we, as the developers, know to be true after a call to the routine in question. Then we can add assertions after the calls to see if the provers can verify those conditions. If not, we extend the postcondition on the routine. For example, we can say that after a call to Push, the corresponding stack cannot be empty. Likewise, after a call to Pop, the stack cannot be full. These additions are not required for the sake of assertions or other preconditions because the Extent function already tells the provers what they need to know in this regard. However, they are good documentation and may be required to prove additional conditions added later. (That is the case, in fact, as will be shown.) To see what other postconditions are required, we now switch to the other main procedure, in the “demo_gold.adb” file. This version of the demo program includes a number of additional assertions: 1 with Ada.Text_IO; use Ada.Text_IO; 2 with Character_Stacks; use Character_Stacks; 3 4 procedure Demo_Gold with SPARK_Mode is 5 6 S1, S2 : Stack (Capacity => 10); -- arbitrary 7 8 X, Y : Character; 9 10 begin 11 pragma Assert (Empty (S1) and Empty (S2)); 12 pragma Assert (S1 = S2); 13 Push (S1, 'a'); 14 pragma Assert (not Empty (S1)); 15 pragma Assert (Top_Element (S1) = 'a'); 16 Push (S1, 'b'); 17 pragma Assert (S1 /= S2); 18 19 Put_Line ("Top of S1 is '" & Top_Element (S1) & "'"); 20 21 Pop (S1, X); 22 Put_Line ("Top of S1 is '" & Top_Element (S1) & "'"); 23 Pop (S1, Y); 24 pragma Assert (X = 'b'); 25 pragma Assert (Y = 'a'); 26 pragma Assert (S1 = S2); 27 Put_Line (X & Y); 28 29 Push (S1, 'a'); 30 Copy (Source => S1, Destination => S2); 31 pragma Assert (S1 = S2); 32 pragma Assert (Top_Element (S1) = Top_Element (S2)); 33 pragma Assert (Extent (S1) = Extent (S2)); 34 35 Reset (S1); 36 pragma Assert (Empty (S1)); 37 pragma Assert (S1 /= S2); 38 39 Put_Line ("Done"); 40 end Demo_Gold; For example, we have added assertions after the calls to Reset and Copy, on lines 31 through 33 and 36 through 37, respectively. GNATprove now emits the following (elided) messages for those assertions: demo_gold.adb:31:19: medium: assertion might fail, cannot prove S1 = S2 (e.g. when S1 = (…, Top => 0) and S2 = (…, Top => 0)) [possible explanation: call at line 30 should mention Destination (for argument S2) in a postcondition] demo_gold.adb:36:19: medium: assertion might fail, cannot prove Empty (S1) … [possible explanation: call at line 35 should mention This (for argument S1) in a postcondition] Note again the “possible explanation” hints. For the first message we need to add a postcondition on Copy specifying that the value of the argument passed to Destination will be equal to that of the Source argument (line 3): 1 procedure Copy (Destination : in out Stack; Source : Stack) with 2 Pre => Destination.Capacity >= Extent (Source), 3 Post => Destination = Source, 4 Global => null; We must move the “=” function implementation to the package spec so that the provers will know the meaning. The function was already completed as an expression function so moving it to the spec is all that is required. For the second message, regarding the failure to prove that a stack is Empty after Reset, we add a postcondition to that effect (line 2): 1 procedure Reset (This : in out Stack) with 2 Post => Empty (This), 3 Global => null; The completion for function Empty was already moved to the package spec, earlier. The implementations of procedure Copy and function “=” might have required explicit loops, likely requiring loop invariants, but using array slicing we can express the loop implicitly. Here is function “=” again, for example: 1 function "=" (Left, Right : Stack) return Boolean is 2 (Left.Top = Right.Top and then 3 Left.Values (1 .. Left.Top) = Right.Values (1 .. Right.Top)); The slice comparison on line 3 expresses an implicit loop for us, as does the slice assignment in procedure Copy. The function could have been implemented as follows, with an explicit loop: 1 function "=" (Left, Right : Stack) return Boolean is 2 begin 3 if Left.Top /= Right.Top then 4 -- They hold a different number of element values so 5 -- cannot be equal. 6 return False; 7 end if; 8 -- The two Top values are the same, and the arrays 9 -- are 1-based, so the bounds are the same. Hence the 10 -- choice of Left.Top or Right.Top is arbitrary and 11 -- there is no need for index offsets. 12 for K in 1 .. Left.Top loop 13 if Left.Values (K) /= Right.Values (K) then 14 return False; 15 end if; 16 pragma Loop_Invariant 17 (Left.Values (1 .. K) = Right.Values (1 .. K)); 18 end loop; 19 -- We didn't find a difference 20 return True; 21 end "="; Note the loop invariant on lines 16 and 17. In some circumstances GNATprove will handle the invariants for us but often it cannot. In practice, writing sufficient loop invariants is one of the more difficult facets of SPARK development so the chance to avoid them is welcome. Continuing, we know that after the body of Push executes, the top element contained in the stack will be the value passed to Push as an argument. But the provers cannot verify an assertion to that effect (line 15 below): 13 Push (S1, 'a'); 14 pragma Assert (not Empty (S1)); 15 pragma Assert (Top_Element (S1) = 'a'); GNATprove emits this message: demo_gold.adb:15:19: medium: assertion might fail, cannot prove Top_Element (S1) = 'a' We must extend the postcondition for Push to state that Top_Element would return the value just pushed, as shown on line 4 below: 1 procedure Push (This : in out Stack; Item : Element) with 2 Pre => not Full (This), 3 Post => not Empty (This) 4 and then Top_Element (This) = Item 5 and then Extent (This) = Extent (This)'Old + 1, 6 Global => null; Now the assertion on line 15 is verified successfully. Recall that the precondition for function Top_Element is that the stack is not empty. We already have that assertion in the postcondition (line 3) so the precondition for Top_Element is satisfied. We must use the short circuit form for the conjunction, though, to control the order of evaluation so that “not Empty” is verified before Top_Element. The short-circuit form on line 4 necessitates the same form on line 5, per Ada rules. That triggers a subtle issue flagged by GNATprove. The short-circuit form, by definition, means that the evaluation of line 5 might not occur. If it is not evaluated, we’ve told the compiler to call Extent and make a copy of the result (via ‘Old, on the right-hand side of “=”) that will not be needed. Moreover, the execution of Extent might raise an exception. Therefore, the language disallows applying ‘Old in any potentially unevaluated expression that might raise exceptions. As a consequence, in line 5 we cannot apply ‘Old to the result of calling Extent. GNATprove issues this error message: prefix of attribute "Old" that is potentially unevaluated must denote an entity We could address the error by changing line 5 to use Extent(This'Old) instead, but there is a potential performance difference between Extent(This)'Old and Extent(This'Old). With the former, only the result of the function call is copied, whereas with the latter, the value of the parameter is copied. Copying the parameter could take significant time and space if This is a large object. Of course, if the function returns a large value the copy will be large too, but in this case Extent only returns an integer. In SPARK, unlike Ada, preconditions, postconditions, and assertions in general are verified statically, prior to execution, so there is no performance issue. Ultimately, though, the application will be executed. Having statically proven the preconditions and postconditions successfully, we can safely deploy the final executable without them enabled, but not all projects follow that approach (at least, not on that basis). Therefore, for the sake of emphasizing the idiom with typically better performance, we prefer applying ‘Old to the function in our implementation. We can tell GNATprove that this is a benign case, using a pragma in the package spec:  pragma Unevaluated_Use_of_Old (Allow); GNATprove will then allow use of ‘Old on the call to function Extent and will ensure that no exceptions will be raised by the function. As with procedure Push, we can also use Top_Element to strengthen the postcondition for procedure Pop (line 4 below): 1 procedure Pop (This : in out Stack; Item : out Element) with 2 Pre => not Empty (This), 3 Post => not Full (This) 4 and Item = Top_Element (This)'Old 5 and Extent (This) = Extent (This)'Old – 1, 6 Global => null; Line 4 states that the Item returned in the parameter to Pop is the value that would be returned by Top_Element prior to the call to Pop. One last significant enhancement now remains to be made. Consider the assertions in the main procedure about the effects of Pop on lines 24 and 25, repeated below: 21 Pop (S1, X); 22 Put_Line ("Top of S1 is '" & Top_Element (S1) & "'"); 23 Pop (S1, Y); 24 pragma Assert (X = 'b'); 25 pragma Assert (Y = 'a'); Previous lines had pushed ‘a’ and then ‘b’ in that order onto S1. GNATprove emits this one message: 25:19: medium: assertion might fail, cannot prove Y = 'a' (e.g. when Y = 'b') The message is about the assertion on line 25, alone. The assertion on line 24 was verified. Also, the message indicates that Y could be some arbitrary character. We can conclude that the provers do not know enough about the state of the stack after a call to Pop. The postcondition requires strengthening. The necessary postcondition extension reflects a unit-level functional requirement for both Push and Pop. If one considers that postconditions correspond to the low-level unit functional requirements (if not more), one can see why the postconditions must be complete. Identifying and expressing complete functional requirements is difficult in itself, and indeed the need for this additional postcondition content is not obvious at first. The unit-level requirement for both operations is that the prior array components within the stack are not altered, other than the one added or removed. We need to state that Push and Pop have not reordered them, for example. Specifically, for Push we need to say that the new stack state has exactly the same prior array slice contents, ignoring the newly pushed value. For Pop, we need to say that the new state has exactly the prior array slice contents without the old value at the top. A new function can be used to express these requirements for both Push and Pop:  function Unchanged (Invariant_Part, Within : Stack) return Boolean; The Within parameter is a stack whose internal state will be compared against that of the Invariant_Part parameter. The name “Invariant_Part” is chosen to indicate the stack state that has not changed. The name "Within" is chosen for readability in named parameter associations on the calls. For example: Unchanged (X, Within => Y) means that the Element values of X should be equal to precisely the corresponding values within Y. However, this function is not one that users would call directly. We only need it for proof. Therefore, we mark the Unchanged function as a "ghost" function so that the compiler will neither generate code for it nor allow the application code to call it. The function is declared with that aspect (on line 2) as follows: 1 function Unchanged (Invariant_Part, Within : Stack) return Boolean 2 with Ghost; Key to the usage is the fact that by passing This'Old and This to the two parameters we can compare the before/after states of a single object. Viewing the function's implementation will help understand its use in the postconditions: 1 function Unchanged (Invariant_Part, Within : Stack) return Boolean is 2 (Invariant_Part.Top <= Within.Top and then 3 (for all K in 1 .. Invariant_Part.Top => 4 Within.Values (K) = Invariant_Part.Values (K))); This approach is based directly on a very clever one by Rod Chapman, as seen in some similar code. The function states that the array components logically contained in Invariant_Part must have the same values as those corresponding array components in Within. Note how we allow Invariant_Part to contain fewer values than the other stack (line 2 above). That is necessary because we use this function in the postconditions for both the Push and Pop operations, in which one more or one less Element value will be present, respectively. For Push, we add a call to the function in the postcondition as line 6, below: 1 procedure Push (This : in out Stack; Item : Element) with 2 Pre => not Full (This), 3 Post => not Empty (This) 4 and then Top_Element (This) = Item 5 and then Extent (This) = Extent (This)'Old + 1 6 and then Unchanged (This'Old, Within => This), 7 Global => null; This'Old provides the value of the stack prior to the call of Push, without the new value included, whereas This represents the stack state after Push returns, with the new value in place. Thus, the prior values are compared to the corresponding values in the new state, with the newly included value ignored. Likewise, we add the function call to the postcondition for Pop, also line 6, below: ]]> An Introduction to Contract-Based Programming in Ada https://blog.adacore.com/the-case-for-contracts Tue, 21 Apr 2020 12:26:00 +0000 Abe Cohen https://blog.adacore.com/the-case-for-contracts One of the most powerful features of Ada 2012* is the ability to specify contracts on your code. Contracts describe conditions that must be satisfied upon entry (preconditions) and upon exit (postconditions) of your subprogram. Preconditions describe the context in which the subprogram must be called, and postconditions describe conditions that will be adhered to by the subprogram’s implementation. If you think about it, contracts are a natural evolution of Ada’s core design principle. To encourage developers to be as explicit as possible with their expressions, putting both the compiler/toolchain and other developers in the best position to help them develop better code. The addition of contracts into a standard Ada application accomplishes several elusive objectives; specifically, they act as a static method of handling potential errors, as documentation that gets updated and checked for consistency by the compiler alongside your code, and provide static analysis tools like SPARK and CodePeer with more application-specific detail they can use to produce higher-quality results. So let’s get started. package Graph is type Graph_Record (Nodes : Positive) is record Adj_List : Adjacency_List (1 .. Nodes); Node_List : Node_List_Type (1 .. Nodes); end record; procedure Set_Source (Graph : in out Graph_Record; ID : Positive); end Graph; package body Graph is procedure Set_Source (Graph : in out Graph_Record; ID : Positive) is begin Graph.Node_List (ID).dist := 0; end Set_Source; end Graph; Here is a package with a simple subprogram that sets a property of a graph. One thing to notice about the graph from its definition is that its nodes are labelled with IDs from 1 to the number of nodes. In order to make sure that our subprogram doesn’t index into the graph’s list of nodes out of bounds, we might do a number of things. We can change Set_Source to a function that returns a boolean - True if the operation was successful, False if the supplied ID is out of range. Another option is to do nothing and make use of the default compiler-inserted array access check (I'll get into the drawbacks of this later), or we can even insert an explicit defensive check of our own if we want to raise a specific exception with a specific message. However, all of these approaches come with two fundamental issues: they require additional documentation to be effective, and they rely on checks and/or exception handlers at run-time to prevent errors which can hurt performance. By adding a simple precondition, we can mitigate both of these problems at the same time. procedure Set_Source (Graph : in out Graph_Record; ID : Positive) with Pre => (ID <= Graph.Nodes); The documentation issue is more obvious, so I’ll address that one first. Anyone using this API, even someone without access to the implementation, now knows that this subprogram expects to be called with the ID parameter in a specific range, yet no additional documentation is needed to express this. If we were using conventional methods, we would need another way to tell API users how to correctly use this subprogram. However, using contracts in this manner integrates the task of writing and updating documentation with the subprogram’s design process. On top of that, if the subprogram were to be redesigned, say if the Graph record type was broadened to accept characters as indices for Node_List, those new requirements would be reflected in the new preconditions, with no additional information needed. In addition to helping other developers use your subprograms properly, contracts introduce a static methodology for dealing with errors. Conventionally, errors are dealt with via defensive checks and exception handlers at run-time. Particularly in an embedded context, where the final executable size in memory and computational demands need to be optimized, the reduction of run-time code is essential to dealing with hardware constraints. Accordingly, many programs have no choice but to trust that their testing infrastructure was sufficient and ship code with most run-time checks turned off. However, it’s not revolutionary to say that all programs wish their applications ran safely with less overhead. Using contracts provides an elegant way for developers higher in the call chain to take appropriate action to avoid violating known conditions that will cause program failure without adding run-time code at every level, as would happen with either explicit or compiler-inserted defensive checks or propagating exception handlers. Sometimes though, as in the case of input validation, there’s no way to get around defensive code at run-time. Contracts provide the flexibility to add these checks both broadly and on a granular level. If you pass the ‘-gnata’ switch to the compiler, it will insert additional checks assuring your contracts are not violated alongside the standard Ada run-time checks, like range checks on types. However, if you just want to enable a single defensive check, you can do something like this: pragma Assertion_Policy (Pre => Check, Post => Ignore); procedure Set_Source (Graph : out Graph_Record; ID : Positive) is begin Graph.Node_List (ID).dist := 0; end; The use of contracts can also increase organizational confidence that testing was in fact sufficient, and accounted for all the potential ways in which the application could fail. If you’re not at the level of statically verifying contracts to be unbreakable within the context of your application with SPARK, other static analysis tools, like CodePeer, can benefit from the extra information contracts provide about the intended use of your code. This is because in this context, contracts are language-level proxies of your application’s requirements, and CodePeer, like many other tools, only works on language-level constructs. When CodePeer analyzes a subprogram, it generates implicit pre- and postconditions as part of the analysis. If one of those implicit contracts might be violated, you might get a message like this: medium: precondition (array index check) might fail on call to graph.set_source: requires ID <= Graph.Nodes However, when you supply CodePeer with your own contracts to compare against, it can output situations in which user-supplied contracts contradict some of its own, leading to more specific, more actionable findings, and fewer false positives. To learn more about contracts, check out this chapter from learn.adacore.com, or this section of the SPARK documentation. *Contracts can also be used via pragma Precondition and pragma Postcondition with older versions of GNAT, or approximated with pragma Assert as defined in Ada 05. Learn more about that here. ]]> Ada on the ESP8266 https://blog.adacore.com/ada-on-the-esp8266 Thu, 09 Apr 2020 11:31:00 +0000 Johannes Kliemann https://blog.adacore.com/ada-on-the-esp8266  llvm-gnatmake -c unit.adb -cargs --target=xtensa -mcpu=esp8266
$cd /path/to/Arduino/libraries$ git clone --recursive https://github.com/jklmnn/esp8266-ada-example.git
$cd esp8266-ada-example$ make
$screen /dev/ttyUSB0 115200 Make with Ada! Make with Ada! Make with Ada! Make with Ada! ]]> A Trivial File Transfer Protocol Server written in Ada https://blog.adacore.com/the-elegance-of-open-source-collaboration Tue, 07 Apr 2020 11:50:00 +0000 Martyn Pike https://blog.adacore.com/the-elegance-of-open-source-collaboration For an upcoming project, I needed a simple way of transferring binary files over an Ethernet connection with minimal (if any at all) user interaction. A protocol that's particularly appropriate for this kind of usage is the Trivial File Transfer Protocol (TFTP). You can find a high level description on Wikipedia and a more detailed breakdown of the protocol here. My previous experience with this protocol has mostly been within test rig environments, where a target computer accesses its operational software payload from a TFTP server at boot-time. The beauty of this is approach is that it allows for different payloads to be used between reboots by switching the files being served. That is exactly how this server will be used in my forthcoming project, also to be documented on blog.adacore.com. The Ada TFTP server will be hosted on Ubuntu Linux and support a subset of the transactions provided by the protocol. For example, the ability for the client to write a file to the server will not be supported. There are a number of TFTP servers available for Ubuntu Linux. However I wanted to implement my own in Ada, mainly to prove it could be done, but also to test a couple of different options for handling UDP/IP transactions from within Ada applications. To do this, I needed something to mimic the capabilities of GNAT.Sockets. To start with, I reviewed the current catalogue of available Ada software on Github and the repository from my good friends over at CodeLabs, who happen to be the developers of the Muen separation kernel which will also figure in my forthcoming project. The CodeLabs team had exactly what I was looking for: Anet. I highly recommend that you review its code on the CodeLabs repository. That repository is now my go-to for (publicly available and open source) high quality examples of Ada and SPARK software development. Many of my future blog posts about applying the AdaCore tools and techniques will be oriented around examples based on CodeLabs source code. Back to my TFTP server. Since I had two options for the UDP/IP transaction functionality, I decided to set about creating one version of my server using GNAT.Sockets and another using Anet from CodeLabs. If you want to get ahead of the game, the code is available on my GitHub repository for all three parts of this blog series. The first step towards my objective was to obtain Anet, by cloning the repository, building the library and installing it in a location where the GNAT build tools can locate it. All the code for this blog post can be built with GNAT Community 2019 as well as GNAT Pro. The TFTP server code has been reviewed by CodePeer for the detection of run-time vulnerabilities that may lead to unexpected code execution paths and by GNATcheck against a suitable coding standard. Both CodePeer and GNATcheck are available from AdaCore as professionally assured products and can be qualified as TQL-5 review tools for use by DO-178B/C projects. The following command sequence installs the Anet libraries into ~/sw/adalibs and with a suitable GNAT compiler in my PATH, I would execute the following commands: git clone https://git.codelabs.ch/anet.git cd anet git checkout master make all make PREFIX=~/sw/adalibs install After doing this, there will be a shared library called libanet.so stored in the ~/sw/adalibs/lib directory. Before writing code that will use this library, the GPR_PROJECT_PATH environment variable needs to identify the ~/sw/adalibs/lib/gnat directory. This can be done by using the following export GPR_PROJECT_PATH=~/sw/adalibs/lib/gnat I encountered a slight learning curve with the Anet API because it's architecture differs from that of GNAT.Sockets. However, the code documentation is very good and before I knew it I had a proof of concept working. To build the code from Github (with the same GNAT compiler in the path which was used to build Anet), you can use the following: export GPR_PROJECT_PATH=~/sw/adalibs/lib/gnat git clone https://github.com/darthmartyn/adatftpd-anet cd adatftpd-anet make all The makefile and GNAT project file are as follows: # Assumes GPR_PROJECT_PATH includes Anet installation # Try to use the same GNAT to build adatftpd that was used to # build Anet. all: gprbuild -p -P adatftpd.gpr check: gnatcheck -P adatftpd.gpr --show-rule -rules -from=gnatcheck.rules review: codepeer -P adatftpd.gpr -level 2 -output-msg clean: gprclean -q -P adatftpd.gpr with "anet.gpr"; project Adatftpd is for Languages use ("Ada"); for Source_Dirs use ("src/**"); for Object_Dir use "obj"; for Exec_Dir use "test"; for Main use ("main.adb"); package Builder is for Executable ("main.adb") use "adatftpd-anet"; end Builder; package Compiler is for Switches ("ada") use ("-gnata"); end Compiler; end Adatftpd; Assuming no errors occurred during this sequence of commands, the 'test' sub-directory will contain the 'adatftpf-anet' executable. You can also checkout the verification program I wrote for my TFTP server, which is also available on my Github repository. I'd welcome feedback and collaboration on either of these TFTP related projects. ]]> Proving properties of constant-time crypto code in SPARKNaCl https://blog.adacore.com/proving-constant-time-crypto-code-in-sparknacl Thu, 02 Apr 2020 12:15:00 +0000 Roderick Chapman https://blog.adacore.com/proving-constant-time-crypto-code-in-sparknacl #define FOR(i,n) for (i = 0; i < n; ++i) #define sv static void typedef unsigned char u8; typedef long long i64; typedef i64 gf[16]; sv pack25519(u8 *o, const gf n); subtype I32 is Integer_32; subtype N32 is I32 range 0 .. I32'Last; subtype I64 is Integer_64; subtype Index_32 is I32 range 0 .. 31; type Byte_Seq is array (N32 range <>) of Byte; subtype Bytes_32 is Byte_Seq (Index_32); -- "LM" = "Limb Modulus" -- "LMM1" = "Limb Modulus Minus 1" LM : constant := 65536; LMM1 : constant := 65535; -- "R2256" = "Remainder of 2**256 (modulo 2**255-19)" R2256 : constant := 38; -- "Maximum GF Limb Coefficient" MGFLC : constant := (R2256 * 15) + 1; -- "Maximum GF Limb Product" MGFLP : constant := LMM1 * LMM1; subtype GF_Any_Limb is I64 range -LM .. (MGFLC * MGFLP); type GF is array (Index_16) of GF_Any_Limb; subtype GF_Normal_Limb is I64 range 0 .. LMM1; subtype Normal_GF is GF with Dynamic_Predicate => (for all I in Index_16 => Normal_GF (I) in GF_Normal_Limb);  -- Reduces N modulo (2**255 - 19) then packs the -- value into 32 bytes little-endian. function Pack_25519 (N : in Normal_GF) return Bytes_32 with Global => null;  sv pack25519 (u8 *o, const gf n) { int i, j, b; gf m, t; FOR(i,16) t[i]=n[i]; car25519(t); car25519(t); car25519(t); FOR(j, 2) { m[0]=t[0]-0xffed; for(i=1;i<15;i++) { m[i]=t[i]-0xffff-((m[i-1]>>16)&1); m[i-1]&=0xffff; } m[15]=t[15]-0x7fff-((m[14]>>16)&1); b=(m[15]>>16)&1; m[14]&=0xffff; sel25519 (t, m, 1-b); } FOR(i, 16) { o[2*i]=t[i]&0xff; o[2*i+1]=t[i]>>8; } }  sv sel25519 (gf p, gf q, int b); -- Constant time conditional swap of P and Q. procedure CSwap (P : in out GF; Q : in out GF; Swap : in Boolean) with Global => null, Contract_Cases => (Swap => (P = Q'Old and Q = P'Old) not Swap => (P = P'Old and Q = Q'Old));  if Swap then Temp := P; P := Q; Q := Temp; end if; sv sel25519 (gf p, gf q, int b) { i64 t, i, c = ~(b-1); FOR(i, 16) { t= c&(p[i]^q[i]); p[i]^=t; q[i]^=t; } }  type Bit_To_Swapmask_Table is array (Boolean) of U64; Bit_To_Swapmask : constant Bit_To_Swapmask_Table := (False => 16#0000_0000_0000_0000#, True => 16#FFFF_FFFF_FFFF_FFFF#);  pragma Assume (for all K in I64 => To_I64 (To_U64 (K)) = K); procedure CSwap (P : in out GF; Q : in out GF; Swap : in Boolean) is T : U64; C : U64 := Bit_To_Swapmask (Swap); begin for I in Index_16 loop T := C and (To_U64 (P (I)) xor To_U64 (Q (I))); P (I) := To_I64 (To_U64 (P (I)) xor T); Q (I) := To_I64 (To_U64 (Q (I)) xor T); pragma Loop_Invariant (if Swap then (for all J in Index_16 range 0 .. I => (P (J) = Q'Loop_Entry (J) and Q (J) = P'Loop_Entry (J))) else (for all J in Index_16 range 0 .. I => (P (J) = P'Loop_Entry (J) and Q (J) = Q'Loop_Entry (J))) ); end loop; end CSwap;  -- Subtracting P twice from a Normal_GF might result -- in a GF where limb 15 can be negative with lower bound -65536 subtype Temp_GF_MSL is I64 range -LM .. LMM1; subtype Temp_GF is GF with Dynamic_Predicate => (Temp_GF (15) in Temp_GF_MSL and (for all K in Index_16 range 0 .. 14 => Temp_GF (K) in GF_Normal_Limb)); procedure Subtract_P (T : in Temp_GF; Result : out Temp_GF; Underflow : out Boolean) with Global => null, Pre => T (15) >= -16#8000#, Post => (Result (15) >= T (15) - 16#8000#);  subtype I64_Bit is I64 range 0 .. 1; procedure Subtract_P (T : in Temp_GF; Result : out Temp_GF; Underflow : out Boolean) is Carry : I64_Bit; R : GF; begin R := (others => 0); -- Limb 0 - subtract LSL of P, which is 16#FFED# R (0) := T (0) - 16#FFED#; -- Limbs 1 .. 14 - subtract FFFF with carry for I in Index_16 range 1 .. 14 loop Carry := ASR_16 (R (I - 1)) mod 2; R (I) := T (I) - 16#FFFF# - Carry; R (I - 1) := R (I - 1) mod LM; pragma Loop_Invariant (for all J in Index_16 range 0 .. I - 1 => R (J) in GF_Normal_Limb); pragma Loop_Invariant (T in Temp_GF); end loop; -- Limb 15 - Subtract MSL (Most Significant Limb) -- of P (16#7FFF#) with carry. -- Note that Limb 15 might become negative on underflow Carry := ASR_16 (R (14)) mod 2; R (15) := (T (15) - 16#7FFF#) - Carry; R (14) := R (14) mod LM; -- Note that R (15) is not normalized here, so that the -- result of the first subtraction is numerically correct -- as the input to the second. Underflow := R (15) < 0; Result := R; end Subtract_P;  function Pack_25519 (N : in Normal_GF) return Bytes_32 is L : GF; R1, R2 : Temp_GF; First_Underflow : Boolean; Second_Underflow : Boolean; begin L := N; Subtract_P (L, R1, First_Underflow); Subtract_P (R1, R2, Second_Underflow); CSwap (R1, R2, Second_Underflow); CSwap (L, R2, First_Underflow); return To_Bytes_32 (R2); end Pack_25519; sparknacl-utils.adb:197:27: medium: predicate check might fail  -- Result := T - P; -- if Underflow, then Result is not a Normal_GF -- if not Underflow, then Result is a Normal_GF procedure Subtract_P (T : in Temp_GF; Result : out Temp_GF; Underflow : out Boolean) with Global => null, Pre => T (15) >= -16#8000#, Post => (Result (15) >= T (15) - 16#8000#) and then (Underflow /= (Result in Normal_GF));  R (14) := R (14) mod LM;  R (15) := R (15) mod LM; sparknacl-utils.adb:139:23: medium: predicate check might fail  ]]> Time travel debugging in GNAT Studio with GDB and RR https://blog.adacore.com/time-travel-debugging-in-gnat-studio Tue, 17 Mar 2020 13:18:07 +0000 Ghjuvan Lacambre https://blog.adacore.com/time-travel-debugging-in-gnat-studio with Ada.Numerics.Discrete_Random; procedure Main is package Rand_Positive is new Ada.Numerics.Discrete_Random(Positive); Generator : Rand_Positive.Generator; Error : exception; Bug : Boolean := False; procedure Make_Bug is begin Bug := True; end Make_Bug; procedure Do_Bug is begin Bug := True; end Do_Bug; begin Rand_Positive.Reset(Generator); for I in 1..10 loop if Rand_Positive.Random(Generator) < (Positive'Last / 100) then if Rand_Positive.Random(Generator) < (Positive'Last / 2) then Make_Bug; else Do_Bug; end if; end if; end loop; if Bug then raise Error; end if; end Main; ]]> Android application with Ada and WebAssembly https://blog.adacore.com/android-application-with-ada-and-webassembly Thu, 12 Mar 2020 14:08:19 +0000 Maxim Reznik https://blog.adacore.com/android-application-with-ada-and-webassembly Having previously shown how to create a Web application in Ada, it's not so difficult to create an Android application in Ada. Perhaps the simplest way is to install Android Studio. Then just create a new project and choose "Empty Activity". Open the layout, delete TextView and put WebView instead. In onCreate function write the initialization code: WebView webView = (WebView)findViewById(R.id.webView); WebSettings settings = webView.getSettings(); settings.setJavaScriptEnabled(true);  To make WebView work offline, you need to provide content. One way to do this is just to put content in the asset folder and open it as a URL in WebView. When a user starts the application, WebView will load HTML and corresponding JavaScript. Then JavaScript loads WebAssembly and so, actually, launches Ada code. But it can't use a file:/// schema to load JavaScript and WebAssembly files because of the default security settings. So we trick WebView by intercepting requests and also provide correct MIME types for them. We do this using the shouldInterceptRequest method of WebViewClient class to intercept any request to HTML/WASM/JS/JPEG resources and load the corresponding file from the asset folder: public WebResourceResponse shouldInterceptRequest(WebView view, WebResourceRequest request) { String path = request.getUrl().getLastPathSegment(); try { String mime; AssetManager assetManager = getAssets(); if (path.endsWith(".html")) mime = "text/html"; else if (path.endsWith(".wasm")) mime = "application/wasm"; else if (path.endsWith(".mjs")) mime = "text/javascript"; else if (path.endsWith(".jpg")) mime = "image/jpeg"; else return super.shouldInterceptRequest(view, request); InputStream input = assetManager.open("www/" + path); return new WebResourceResponse(mime, "utf-8", input); } catch (IOException e) { e.printStackTrace(); ByteArrayInputStream result = new ByteArrayInputStream (("X:" + path + " E:" + e.toString()).getBytes()); return new WebResourceResponse("text/plain", "utf-8", result); } }  Now connect this code to WebView, like this: webView.setWebViewClient(new WebViewClient() { @Override public WebResourceResponse shouldInterceptRequest(WebView view, .... }); For debug purposes, let's connect the WebView console to Android log. We just add this function below the code for shouldInterceptRequest: public boolean onConsoleMessage(ConsoleMessage cm) { Log.d("MyApplication", cm.message() + " -- From line " + cm.lineNumber() + " of " + cm.sourceId() ); return true; }  Now we're able to build and run an Android Package. Here is how it looks like on Android Studio emulator (it's been tested on my phone too!): If you need the complete code, there's a repository on github! PS: This article doesn't discuss how we produced WebAssembly from Ada code for running with WebGL integration. We will write a follow-up post about that soon! ]]> Making an RC Car with Ada and SPARK https://blog.adacore.com/making-an-rc-car-with-ada-and-spark Tue, 10 Mar 2020 13:52:00 +0000 Pat Rogers https://blog.adacore.com/making-an-rc-car-with-ada-and-spark As a demonstration for the use of Ada and SPARK in very small embedded targets, I created a remote-controlled (RC) car using Lego NXT Mindstorms motors and sensors but without using the Lego computer or Lego software. I used an ARM Cortex System-on-Chip board for the computer, and all the code -- the control program, the device drivers, everything -- is written in Ada. Over time, I’ve upgraded some of the code to be in SPARK. This blog post describes the hardware, the software, the SPARK upgrades, and the repositories that are used and created for this purpose. Why use Lego NXT parts? The Lego NXT robotics kit was extremely popular. Many schools and individuals still have kits and third-party components. Even if the latest Lego kit is much more capable, the ubiquity and low cost of the NXT components make them an attractive basis for experiments and demonstrations. In addition, there are many existing NXT projects upon which to base demonstrations using Ada. For example, the RC car is based on the third-party HiTechnic IR RC Car, following instructions available here: http://www.hitechnic.com/models. The car turns extremely well because it has an Ackerman steering mechanism, so that the inside wheel turns sharper than the outside wheel, and a differential on the drive shaft so that the drive wheels can rotate at different speeds during a turn. The original car uses the HiTechnic IR (infra-red) receiver to communicate with a Lego remote control. This new car uses that same receiver and controller, but also supports another controller communicating over Bluetooth LE. Replacing the NXT Brick The NXT embedded computer controlling NXT robots is known as the “brick,” probably because of its appearance. (See Figure 1.) It consists of an older 48 MHz ARMv7, with 256 KB of FLASH and 64 KB of RAM, as well as an AVR co-processor. The brick enclosure provides an LCD screen, a speaker, Bluetooth, and four user-buttons, combined with the electronics required to interface to the external world. A battery pack is on the back. Our replacement computer is one of the “Discovery Kit” products from STMicroelectronics. The Discovery Kits have ARM Cortex processors and include many on-package devices for interfacing to the external world, including A/D and D/A converters, timers, UARTs, DMA controllers, I2C and SPI communication, and others. Sophisticated external components are also included, depending upon the specific kit. Specifically, we use the STM32F4 Discovery Kit which has a Cortex M4 MCU running at up to 168 MHz, a floating-point co-processor, a megabyte of FLASH and 192 KB of RAM. It also includes an accelerometer, MEMS microphone, audio codec, a user button, and four user LEDs. (See figure 2.) It is very inexpensive– approximately$15. Details are available here:

https://www.st.com/en/evaluation-tools/stm32f4discovery.html

I made one change to the Discovery Kit board as received from the factory. Because the on-package devices, such as the serial ports, I2C devices, timers, etc. all share potentially overlapping groups of GPIO pins, and because not all pins are available on the headers, not all the pins required were exclusively available for all the devices needed for the RC car. Ultimately, I found a set of pin allocations that would almost work, but I needed pin PA0 to do it. However, pin PA0 is dedicated to the blue User button by a solder bridge on the underside of the board. I removed that solder bridge to make PA0 available. Of course, doing so disabled the blue User button but I didn’t need it for this project.

Replacing the NXT brick also removed the internal interface electronics for the motors and sensors. I used a combination of a third-party board and hand-made circuits to replace them. A brief examination of the motors will serve to explain why the additional board was chosen.

The Lego Mindstorms motors are 9-volt DC motors with a precise rotation sensor and significant gear reduction producing high torque. The motors rotate at a rate relative to the power applied and can rotate in either direction. The polarity of the power lines controls the rotation direction: positive rotates one way, negative rotates the other way.

Figure 3 illustrates the partial internals of the NXT motor, including the gear train in light blue, and the rotation sensor to the left in dark blue, next to the motor itself in dark orange. (The dark gray part at far left is the connector housing.)

I mentioned that the polarity of the applied power determines the rotation direction. That polarity control requires an external circuit, specifically an ‘H-bridge” circuit that allows us to achieve that effect.

Figure 4 shows the functional layout of the H-bridge circuit, in particular the arrangement of the four switches S1 through S4 around the motor M. By selectively closing two switches and leaving the other two open we can control the direction of the current flow, and thereby control the direction of the motor rotation.

Figure 5 illustrates two of the three useful switch configurations. The red line shows the current flow. Another option is to close two switches on the same side and end, in which case the rotor will “lock” in place. Opening all the switches removes all power and thus does not cause rotation. The fourth possible combination, in which all switches are closed, is not used.

Rather than build my own H-bridge circuit I used a low-cost product dedicated to interfacing with NXT motors and sensors. In addition to the H-bridge circuits, they also provide filters for the rotation sensor’s discrete inputs so that noise does not result in too many false rotation counts. There are a number of these products available.

One such is the “Arduino NXT Shield Version 2” by TKJ Electronics: http://www.tkjelectronics.dk/ in Denmark. The product is described in their blog, here: http://blog.tkjelectronics.dk/2011/10/nxt-shield-ver2/  and is available for sale here: http://shop.tkjelectronics.dk/product_info.php?products_id=29 for a reasonable price.

The “NXT Shield” can control two NXT motors and one sensor requiring 9 volts input, including a Mindstorms NXT Ultrasonic Sensor. Figure 6 shows the NXT Shield with the two standard NXT connectors on the left for the two motors, and the sensor connector on the right.

The kit requires assembly but it is just through-board soldering. As long as you get the diodes oriented correctly everything is straightforward. Figure 7 (below) shows our build, already located in an enclosure and connected to the Discovery Kit, power, two NXT motors, and the ultrasonic sensor.

The in-coming 9 volts is routed to a DC power jack on the back of the enclosure, visible on the bottom left with red and black wires connecting it to the board. The 5 volts for the on-board electronics comes via the Discovery Kit header and is bundled with the white and green wires coming in through the left side in the figure. The enclosure itself is one of the “Make with Ada” boxes. “Make with Ada” is a competition offering serious prize money for cool projects using embedded targets and Ada. See http://www.makewithada.org/ for more information.

The power supply replacing the battery pack on the back of the NXT brick is an external battery intended for charging cell phones and tablets.

This battery provides separate connections for +5 and +9 (or +12) volts, which is very convenient: the +5V is provided via USB connector, which is precisely what the STM32F4 card requires, and both the NXT motors and the NXT ultrasonic sensor require +9 volts. The battery isn't light but holds a charge for a very long time, especially with this relatively light load. Note that the battery can also provide +12 volts instead of +9, selected by a physical slider switch on the side of the battery. Using +12 volts will drive the motors considerably faster and is (evidently) tolerated by the NXT Shield sensor circuit and the NXT Ultrasonic Sensor itself.

Finally, I required a small circuit supporting the I2C communication with the HiTechnic IR Receiver. The circuit is as simple as one can imagine: power, ground, and a pull-up resistor for each of the two I2C communication lines. These components are housed in the traditional Altoids tin and take power and ground from the Discovery Kit header pins. The communication lines go to specific GPIO header pins.

All of these replacements and the overall completed car (known as "Bob"), are shown in the following images:

Figure 10 shows the rear enclosure containing the NXT Shield board, labeled “Make With Ada” on the outside, and the Altoids tin on the side containing the small circuit for the IR receiver.

Here is the car in action:

Replacing the NXT Software

The Ada Drivers Library (ADL) provided by AdaCore and the Ada community supplies the device drivers for the timers, I2C, A/D and D/A converters, and other devices required to replace those in the the NXT brick. The ADL supports a variety of development platforms from various vendors, including the STM32 series boards. The ADL is available on GitHub for both non-proprietary and commercial use here: https://github.com/AdaCore/Ada_Drivers_Library.

Replacing the brick will also require drivers for the NXT sensors and motors, software that is not included in the ADL. However, we can base them on the ADL drivers for our target board. For example, the motor rotary encoder driver uses the STM32 timer driver internally because those timers directly support quadrature rotation encoders. All these abstractions, including some that are not hardware specific, are in the Robotics with Ada repository: https://github.com/AdaCore/Robotics_with_Ada. This repo supports the NXT motors and all the basic sensors, as well as some third-party sensors. Abstract base types are used for the more complex sensors so that new sensors can be created easily using inheritance.

In addition, the repository contains some signal processing and control system software, e.g., a “recursive moving average” (RMA) noise filer type and a closed loop PID controller type. These require further packages, such as a bounded ring buffer abstraction.

For example, the analog sensors (e.g., the light and sound sensors), have an abstract base class controlling an ADC, and two abstract subclasses using DMA and polling to transfer the converted data. The concrete light and sound sensor types are derived from the DMA-based parent type (figure 11).

The so-called NXT “digital” devices contain an embedded chip. These follow a similar design with an abstract base class and concrete subclass drivers for the more sophisticated, complex sensors. Lego refers to these sensors as “digital” sensors because they do not provide an analog signal to be sampled. Instead, the drivers both command and query the internal chips to operate the sensors.

The sensors’ chips use the NXT hardware cable connectors’ two discrete I/O lines to communicate. Therefore, a serial communications protocol based on two wires is applied. This communication protocol is usually, but not always, the “I2C” serial protocol.  The Lego Ultrasonic Sonar sensor and the HiTechnic IR Receiver sensor both use I2C for communication. In contrast, version 2 of the Lego Color sensor uses the two discrete lines with an ad-hoc protocol.

The HiTechnic IR Receiver driver uses the I2C driver from the ADL for the on-package I2C hardware. That is a simple approach that also offloads the work from the MCU. The NXT Ultrasonic sensor, on the other hand, was a problem.  I could send data to the Ultrasonic sensor successfully using the on-package I2C hardware (via the ADL driver) but could not get any data back. As discussed on the Internet, the problem is that the sensor does not follow the standard I2C protocol. It requires an extra communication line state change in the middle of the receiving steps. I could not find a way to make the on-package I2C hardware in the ARM package do this extra line change. The NXT Shield hardware even includes a GPIO “back door” connection to the I2C data line for this purpose, but I could not make that work with the STM32 hardware. Ultimately, I had to use a bit-banged approach in place of the I2C hardware and ADL driver. Fortunately, the vendor of the NXT Shield also provides the source code for an ultrasonic sensor driver in C++ using the Arduino “Wire” interface for I2C so I could see exactly what was required.

Bit-banging has system-wide implications. Since the software is doing the low-level communication instead of the on-package I2C hardware, interrupting the software execution in the middle of the protocol could be a problem. That would mean that the priority of the task handing the device must be sufficiently high relative to the other tasks in the system. Bit-banging also means an additional utilization of the MPU that would otherwise be offloaded to a separate I2C hardware device. Our application is rather simple, so processor overload is not a problem. Care with the task priorities was required, though.

You cannot hear the ultrasonic sensor pings, as the sensor name indicates. However, I recorded the videos on my cellphone and its microphone detects the pings. They are very directional, necessarily, so they are only heard in the video when the car is pointing at the phone. Here is another short video of the car, stationary, with the camera immediately in front. The pings are quite noticeable:

System Architecture

The overall architecture of the control software is shown below in figure 12.

In the diagram, the parallelograms are periodic tasks (threads), running until power is removed. Each task is located inside a dedicated package. The Remote_Control package and the Vehicle package also provide functions that are callable by clients. Calls are indicated by dotted lines, with the arrowhead indicating the flow of data. For example, the Servo task in the Steering_Control package calls the Remote_Control package’s function to get the currently requested steering angle.

The Steering Motor and Propulsion Motor boxes represent the two NXT motors. Each motor has a dedicated rotary encoder inside the motor housing but the diagram depicts them as distinct in order to more clearly show their usage. The PID controller and vehicle noise filter are completely in software.

The PID (Proportional Integral Derivative) controller is a closed-loop control mechanism that uses feedback from the system under control to maintain a requested value. These mechanisms are ubiquitous, for example in your house's thermostat maintaining your requested heating and cooling temperatures. In our case, the PID controller maintains the requested steering angle using the steering motor's encoder data as the feedback signal.

The noise filter is a “recursive moving average” filter commonly used in digital signal processing to smooth sensor inputs. (Although the third-party interface board removed some encoder noise, some noise remained.) The PID controller did not require an encoder noise filter because the mechanical steering mechanism has enough “play” in it that encoder noise has no observable effect. The vehicle measured speed calculation, however, needed the filter because the values are used only within the software, not in a physical effector.

The collision detection logic determines whether a collision is imminent, using the NXT ultrasonic sensor data and the vehicle's current speed as inputs. If a sufficiently close object is detected ahead and the vehicle is moving forward, the engine Controller task stops the car immediately. Otherwise, such objects, if any, are ignored.

Application Source Code Example: Steering Servo

As the system diagram shows, the application consists of four primary packages, each containing a dedicated task. (There are other packages as well, but they do not contain tasks.) The task in the Steering_Control package is named “Servo” because it is acting as a servomechanism: it has a feedback control loop. In contrast, the task “Controller” in the Engine_Control package is not acting as a servo because it uses “open loop” control without any feedback. It simply sets the motor power to the requested percentage, with the resulting speed depending on the available battery power and the load on the wheels. I could also use a PID controller to maintain a requested speed, varying the power as required, but did not bother to do so in this version of the application.

The source code for the “Servo” task in the Steering_Control package is shown below, along with the declarations for two subprograms called by the task.

function Current_Motor_Angle (This : Basic_Motor) return Real with Inline;

procedure Convert_To_Motor_Values
(Signed_Power : Real;
Motor_Power  : out NXT.Motors.Power_Level;
Direction    : out NXT.Motors.Directions)
with
Inline,
Pre => Within_Limits (Signed_Power, Power_Level_Limits);

Next_Release       : Time;
Target_Angle       : Real;
Current_Angle      : Real := 0.0;  -- zero for call to Steering_Computer.Enable
Steering_Power     : Real := 0.0;  -- zero for call to Steering_Computer.Enable
Motor_Power        : NXT.Motors.Power_Level;
Rotation_Direction : NXT.Motors.Directions;
Steering_Offset    : Real;
Steering_Computer  : Closed_Loop.PID_Controller;
begin
Steering_Computer.Configure
(Proportional_Gain => Kp,
Integral_Gain     => Ki,
Derivative_Gain   => Kd,
Period            => System_Configuration.Steering_Control_Period,
Output_Limits     => Power_Level_Limits,
Direction         => Closed_Loop.Direct);

Initialize_Steering_Mechanism (Steering_Offset);

Global_Initialization.Critical_Instant.Wait (Epoch => Next_Release);

Steering_Computer.Enable (Current_Angle, Steering_Power);
loop
pragma Loop_Invariant (Steering_Computer.Current_Output_Limits = Power_Level_Limits);
pragma Loop_Invariant (Within_Limits (Steering_Power, Power_Level_Limits));

Current_Angle := Current_Motor_Angle (Steering_Motor) - Steering_Offset;

Target_Angle := Real (Remote_Control.Requested_Steering_Angle);
Limit (Target_Angle, -Steering_Offset, Steering_Offset);

Steering_Computer.Compute_Output
(Process_Variable => Current_Angle,
Setpoint         => Target_Angle,
Control_Variable => Steering_Power);

Convert_To_Motor_Values (Steering_Power, Motor_Power, Rotation_Direction);

Steering_Motor.Engage (Rotation_Direction, Motor_Power);

Next_Release := Next_Release + Period;
delay until Next_Release;
end loop;
end Servo;


The PID controller object declared on line 19 is of a type declared in package Closed_Loop, an instantiation of a generic package. The package is a generic so that the specific floating-point input/output type is not hard-coded. The task first configures the PID controller object named Steering_Computer to specify the PID gain parameters, the interval at which the output routine is called, and the upper and lower limits for the output value (lines 21 through 27). The task then initializes the mechanical steering mechanism in order to get the steering offset (line 29). This offset is required because the steering angle requests from the user (via the remote control) are based on a frame of reference oriented on the major axis of the vehicle. Because I use the steering motor rotation angle to steer the vehicle, the code must translate the requests from the user's frame of reference (ie, the vehicle's) into the frame of reference of the steering motor. The steering motor's frame of reference is defined by the steering mechanism's physical connection to the car’s frame and is not aligned with the car’s major axis. Therefore, to do the translation the code sets the motor encoder to zero at some known point relative to the vehicle's major axis (line 29) and then handles the difference (line 38) between that motor "zero" and the "zero" corresponding to the vehicle. The code thus orients the steering motor's frame of reference to that of the vehicle, and hence to the user.

Having completed these local initialization steps, the Servo task then waits for the “critical instant” in which all the tasks should begin their periodic execution (line 31). The critical instant is time T0 (usually), so the main procedure passes a common absolute time value to each task from the Epoch formal parameter to the Next_Release variable. Each task uses its local Next_Release variable to compute its next iteration release time (lines 52 and 53) using the same initial epoch time. Waiting for this critical instant release also allows each task to wait for any prior processing in the main procedure to occur.

The task then enables the PID controller and goes into the loop. In each iteration, the task determines the current steering angle from the steering motor’s rotary encoder and the computed offset (line 38), gets the requested angle from the remote control and ensures it is within the steering mechanism’s physical limits (lines 40 and 41), then feeds the current angle and target angle into the PID controller (lines 43 through 46). The resulting output value is the steering motor power value required to reach the target angle.

The signed steering power is then converted into an NXT motor power percentage and rotation direction (line 48). Those values are used to engage the steering motor on line 50.

Finally, the task computes the next time it should be released for execution and then suspends itself until that point in time arrives (lines 52 and 53). All the tasks in the system use this same periodic looping idiom, as is expected for time-driven tasks in a Ravenscar tasking profile. (We are actually using the Jorvik tasking profile, based on Ravenscar and defined in Ada 202x. See http://www.ada-auth.org/standa...)

The PID controller is based on the Arduino PID library, version 1.1.1. The primary difference between my design and the Arduino design is that this Ada version does not compute the next time to execute. Instead, because Ada has such good real-time support, barring a design error we can be sure that the periodic task will call the PID output calculation routine at a fixed rate. Therefore, the Configure routine specifies this period, which is then used internally in the output computation. In addition, the PID object does not retain pointers to the input, setpoint, and output objects, for the sake of SPARK compatibility. We pass them as parameters instead.

For a great explanation of the Arduino PID design and implementation, step-by-step, see this web page:

The PID controller abstract data type is declared within a generic package so that the input and output types need not be hard-coded. This specific implementation uses floating-point for the inputs and output, which gives us considerable dynamic range. The ARM MCU includes a floating-point unit so there is no performance penalty. However, if desired, a version using fixed-point types could be defined with the same API, trading some of the problems with floating point computations for problems with fixed-point computations. Neither is perfect.

One of my long terms goals for the RC Car was to upgrade to SPARK as much as possible.  That effort is currently underway and some of the packages and reusable components are now in SPARK. For example, the Steering_Control package, containing the Servo task and PID controller object, are now at the Silver level of SPARK, meaning that it is proven to have no run-time errors, including no overflows. That is the reason for the loop invariants in the Servo task (lines 35 and 36 above), and the precondition on procedure Convert_To_Motor_Values (line 9 above). In particular, the provers needed to be told that the output value limits for the PID controller remain unchanged in each iteration, and that the value of the PID controller output variable remains within those limits.

Other parts of the software are merely in the SPARK subset currently, but some are at the highest level. The recursive moving average (RMA) filter uses a bounded ring buffer type, for example, that is at Gold level, the level of functional proof of unit correctness.

I will continue to upgrade the code to the higher levels, at least the Silver level for proving absence of runtime errors. Ultimately, however, this process will require changes to the ADL drivers because they use access discriminants which are not compatible with SPARK. That is the remaining issue preventing clean proof for the Vehicle package and its Controller task, for instance.

Source Code Availability

There full project for the RC car, including some relevant documents, is here: https://github.com/AdaCore/RC_...

]]>

Martyn’s recent blog post showed small programs based on Libadalang to find uses of access types in Ada sources. Albeit short, these programs need to take care of all the tedious logistics around processing Ada sources: find the files to work on, create a Libadalang analysis context, use it to read the source files, etc. Besides, they are not very convenient to run:

$gprls -s -P test.gpr | ./ptrfinder1 | ./ptrfinder2 The gprls command (shipped with GNAT Pro) is used here in order to get the list of sources that belong to the test.gpr project file. Wouldn’t it be nice if our programs could use the GNATCOLL.Projects API in order to read this project file themselves and get the list of sources to process from there? It’s definitely doable, but also definitely cumbersome: first we need to get the appropriate info from the command line (project file name, potentially target and runtime information, or a *.cgpr configuration file), then call all the various APIs to load the project, and many more operations. Such operations are so common for tools using Libadalang that we have decided to include helpers to factor this in the library itself, so that programs can focus on their real purpose. The 20.1 Libadalang release provides building blocks to save you this trouble: check the App generic package in Libadalang.Helpers. Note that you can see a tutorial and its API reference for it in our nightly documentation. This package is intended to be used as a framework: you instantiate it with your settings at the top-level of your program and call its Run procedure. App then takes over the control of the program: it parses command-line options and invokes when appropriate the callbacks you provided it. Let’s update Martyn’s programs to use App. The job of the first program (ptrfinder1) is to go through source files and report access type declarations and object declarations that have access types. First, we declare some shortcuts for code brevity:  package Helpers renames Libadalang.Helpers; package LAL renames Libadalang.Analysis; package Slocs renames Langkit_Support.Slocs; Next, we can instantiate App:  procedure Process_Unit (Job_Ctx : Helpers.App_Job_Context; Unit : LAL.Analysis_Unit); -- Look for the use of access types in Unit package App is new Helpers.App (Name => "ptrfinder1", Description => "Look for the use of access types in the input sources", Process_Unit => Process_Unit); Naturally, the Process_Unit procedure will be called once for each file to process. The Name and Description formals allow the automatic generation of a “help” message on the command-line (see later). Implementing the Process_Unit procedure is as easy as running minor adjustments on Martyn’s original code:  procedure Report (Node : LAL.Ada_Node'Class); -- Report the use of an access type at Filename/Line_Number on the standard -- output. ------------ -- Report -- ------------ procedure Report (Node : LAL.Ada_Node'Class) is Filename : constant String := Node.Unit.Get_Filename; Line : constant Slocs.Line_Number := Node.Sloc_Range.Start_Line; begin Put_Line (Filename & ":" & Ada.Strings.Fixed.Trim (Line'Image, Ada.Strings.Left)); end Report; ------------------ -- Process_Unit -- ------------------ procedure Process_Unit (Job_Ctx : Helpers.App_Job_Context; Unit : LAL.Analysis_Unit) is pragma Unreferenced (Job_Ctx); function Process_Node (Node : Ada_Node'Class) return Visit_Status; -- Callback for LAL.Traverse ------------------ -- Process_Node -- ------------------ function Process_Node (Node : Ada_Node'Class) return Visit_Status is begin case Node.Kind is when Ada_Base_Type_Decl => if Node.As_Base_Type_Decl.P_Is_Access_Type then Report (Node); end if; when Ada_Object_Decl => if Node.As_Object_Decl.F_Type_Expr .P_Designated_Type_Decl.P_Is_Access_Type then Report (Node); end if; when others => -- Nothing interesting was found in this Node so continue -- processing it for other violations. return Into; end case; -- A violation was detected, skip over any further processing of this -- node. return Over; end Process_Node; begin if not Unit.Has_Diagnostics then Unit.Root.Traverse (Process_Node'Access); end if; end Process_Unit;  We’re nearly done! All that’s left to do is to make our program only call the Run procedure: begin App.Run; end ptrfinder1; That’s it. Build and run this program: $ ./ptrfinder1
No source file to process

$./ptrfinder1 basic_pointers.adb /tmp/access-type-detector/test/basic_pointers.adb:3 /tmp/access-type-detector/test/basic_pointers.adb:5 So far, so good. $ ./ptrfinder1 --help
usage: ptrfinder1 [--help|-h] [--charset|-C CHARSET] [--project|-P PROJECT]
[--scenario-variable|-X
SCENARIO-VARIABLE[SCENARIO-VARIABLE...]] [--target TARGET]
[--RTS RTS] [--config CONFIG] [--auto-dir|-A
AUTO-DIR[AUTO-DIR...]] [--no-traceback] [--symbolic-traceback]
files [files ...]

Look for the use of access types in the input sources

positional arguments:
files                 Files to analyze

optional arguments:
--help, -h            Show this help message
[…]

Wow, that’s a lot! As you can see, App takes care of parsing command-line arguments and provides a lot of built-in options. Most of them are for the various ways to communicate to the application the set of source files to process:

• "ptrfinder1 source1.adb source2.adb …" will process all source files on the command-line, assuming that all source files belong to the current directory;
• "ptrfinder1 -P my_project.gpr [-XKEY=VALUE] [--target=…] [--RTS=…] [--config=…]" will process all source files that belong to the my_project.gpr project file. If additional  source files appear on the command-line, ptrfinder1 will process only them, but my_project.gpr will still be used to find the other source files.

• "ptrfinder1 --auto-dir=src1 --auto-dir=src2" will process all Ada source files that can be found in the src1 and src2 directories. Likewise, additional source files on the command-line will restrict processing to them.

These three use cases should cover most needs, the most reliable one being the project file way: calling gprbuild on the project file (with the same arguments) is a cheap way to check using the compiler that the set of sources passed to the application/Libadalang is complete, consistent and valid Ada.

As it is a common gotcha, let’s take a moment to note that even though your application may process only one source file, Libadalang may need to get access to other source files. For instance, computing the type of a variable in source1.adb may require to read pkg.ads, which defines the type of this variable. This is why passing a project file or --auto-dir options is useful even when you pass the list of source files to process explicitly on the command-line.

Martyn’s second program (ptrfinder2) doesn’t use Libadalang, so rewriting it to use App isn’t very interesting. Instead, let’s extend the previous program to run the text verification on the fly. We are going to add a command-line option to our application to optionally do the verification. Right after the App instantiation, add:

   package Do_Verify is new GNATCOLL.Opt_Parse.Parse_Flag
(App.Args.Parser,
Long => "--verify",
Help => "Verify detected ""access"" occurences");

App’s command-line parser (App.Args.Parser) uses the GNATCOLL.Opt_Parse library, so adding support for new command-line options is very easy. Here, we add a flag, i.e. a switch with no argument: it’s either present or absent. Just doing this already extends the automatic help message:

$./ptrfinder1 --help usage: ptrfinder1 […] files [files ...] [--verify] Look for the use of access types in the input sources positional arguments: files Files to analyze optional arguments: […] --verify, Verify detected "access" occurences Now we can modify the Report procedure to handle this option:  function Verify (Filename : String; Line : Slocs.Line_Number) return Boolean; -- Return whether Filename can be read and that its Line'th line contains -- the " access " substring. procedure Report (Node : LAL.Ada_Node'Class); -- Report the use of an access type at Filename/Line_Number on the standard -- output. If --verify is enabled, check that the first source line -- corresponding to Node contains the " access " substring. ------------ -- Verify -- ------------ function Verify (Filename : String; Line : Slocs.Line_Number) return Boolean is -- Here, we could directly look for an "access" token in the list of -- tokens corresponding to Line in this unit. However, in the spirit of -- the original program, re-read the file with Ada.Text_IO. Found : Boolean := False; -- Whether we have found the substring on the expected line File : File_Type; -- File to read (Filename) begin Open (File, In_File, Filename); for I in 1 .. Line loop declare use type Slocs.Line_Number; Line_Content : constant String := Get_Line (File); begin if I = Line and then Ada.Strings.Fixed.Index (Line_Content, " access ") > 0 then Found := True; end if; end; end loop; Close (File); return Found; exception when Use_Error | Name_Error | Device_Error => Close (File); return Found; end Verify; ------------ -- Report -- ------------ procedure Report (Node : LAL.Ada_Node'Class) is Filename : constant String := Node.Unit.Get_Filename; Line : constant Slocs.Line_Number := Node.Sloc_Range.Start_Line; Line_Image : constant String := Ada.Strings.Fixed.Trim (Line'Image, Ada.Strings.Left); begin if Do_Verify.Get then if Verify (Filename, Line) then Put_Line ("Access Type Verified on line #" & Line_Image & " of " & Filename); else Put_Line ("Suspected Access Type *NOT* Verified on line #" & Line_Image & " of " & Filename); end if; else Put_Line (Filename & ":" & Line_Image); end if; end Report; And voilà! Let’s check how it works: $ ./ptrfinder1 basic_pointers.adb --verify
Access Type Verified on line #3 of /tmp/access-type-detector/test/basic_pointers.adb
Access Type Verified on line #5 of /tmp/access-type-detector/test/basic_pointers.adb

When writing Libadalang-based tools, don’t waste time with trivialities such as command-line parsing: use Libadalang.Helpers.App and go directly to the interesting parts!

You can find the compilable project for this post on my GitHub fork. Just make sure you get Libadalang 20.1 or the next Continuous Release (coming in February 2020). As usual, please send us suggestions and bug reports on GNATtracker (if you are an AdaCore customer) or on Libadalang’s GitHub project.

]]>

The GNAT-LLVM project provides an opportunity to port Ada to new platforms, one of which is WebAssembly. We conducted an experiment to evaluate the porting of Ada and the development of bindings to use Web API provided by the browser directly from Ada applications.

Subtotals

As a result of the experiment, the standard language library and runtime library were partially ported. Together with a binding for the Web API, this allowed us to write a simple example showing the possibility of using Ada for developing applications compiled into WebAssembly and executed inside the browser. At the same time, there are some limitations both of WebAssembly and of the current GNAT-LLVM implementation:

• the inability to use tasks and protected types
• support for exceptions limited to local propagation and the last chance handler
• the inability to use nested subprograms

Example

Here is small example of an Ada program that shows/hides the text when pressing the button by manipulating attributes of document nodes.

with Web.DOM.Event_Listeners;
with Web.DOM.Events;
with Web.HTML.Buttons;
with Web.HTML.Elements;
with Web.Strings;
with Web.Window;

package body Demo is

function "+" (Item : Wide_Wide_String) return Web.Strings.Web_String
renames Web.Strings.To_Web_String;

type Listener is
limited new Web.DOM.Event_Listeners.Event_Listener with null record;

overriding procedure Handle_Event
(Self  : in out Listener;
Event : in out Web.DOM.Events.Event'Class);

L : aliased Listener;

------------------
-- Handle_Event --
------------------

overriding procedure Handle_Event
(Self  : in out Listener;
Event : in out Web.DOM.Events.Event'Class)
is
X : Web.HTML.Elements.HTML_Element
:= Web.Window.Document.Get_Element_By_Id (+"toggle_label");

begin
X.Set_Hidden (not X.Get_Hidden);
end Handle_Event;

---------------------
-- Initialize_Demo --
---------------------

procedure Initialize_Demo is
B : Web.HTML.Buttons.HTML_Button
:= Web.Window.Document.Get_Element_By_Id
(+"toggle_button").As_HTML_Button;

begin
B.Set_Disabled (False);
end Initialize_Demo;

begin
Initialize_Demo;
end Demo;


As you can see, it uses elaboration, tagged and interface types, and callbacks.

Setup & Build

To compile the examples you need to setup GNAT-LLVM & GNAT WASM RTL following instructions in README.md file.

To compile specific example use gprbuild to build application and open index.html in the browser to run it.

Next steps

The source code is published in a repository on GitHub and we invite everyone to participate in the project.

Attachments

]]>

Like last year and the year before, AdaCore will participate to the celebration of Open Source software at FOSDEM. It is always a key event for the Ada/SPARK community and we are looking forward to meet Ada enthusiasts. You can check the program of the Ada/SPARK devroom here.

We have a talk in the Hardware Enablement devroom:

And there is a related talk in the Security devroom on the use of SPARK for security:

Hope to see you at FOSDEM this week-end!

]]>

In the last couple of years, the maker community switched from AVR based micro-controllers (popularized by Arduino) to the ARM Cortex-M architecture. AdaFruit was at the forefront of this migration, with boards like the Circuit Playground Express or some of the Feathers.

AdaFruit chose to adopt the Atmel (now Microchip) SAMD micro-controller family. Unfortunately for us it is not in the list of platforms with the most Ada support so far (stay tuned, this might change soon ;)).

So I was quite happy to see AdaFruit release their first Feather format board including a micro-controller with plenty of Ada support, the STM32F4. I bought a board right away and implemented some support code for it.

The support for the Feather STM32F405 is now available in the Ada Drivers Library, along with two examples. The first just blinks the on-board LED and the second displays Make With Ada on a CharlieWing expansion board.

Setup

You then have to run the script scripts/install_dependencies.py to install the run-time BSPs.

Build

To build the example, open one of the the project files examples/feather_stm32f405/blinky/blinky.gpr or examples/feather_stm32f405/charlie_wing/charlie_wing.gpr with GNATstudio (Aka GPS), and click on the “build all” icon.

Program the board

To program the example on the board, I recommend using the Black Magic Probe debugger (also available from AdaFruit). This neat little device provides a GDB remote server interface to the STM32F4, allowing you not only to program the micro-controller but also to debug it.

An alternative is to use the DFU mode of the STM32F4.

Happy hacking :)

]]>

For nearly four decades the Ada language (in all versions of the standard) has been helping developers meet the most stringent reliability, safety and security requirements in the embedded market. As such, Ada has become an entrenched player in its historic A&D niche, where its technical advantages are recognized and well understood. Ada has also seen usage in other domains (such as medical and transportation) but its penetration has progressed at a somewhat slower pace. In these other markets Ada stands in particular contrast with the C language, which, although suffering from extremely well known and documented flaws, remains a strong and seldom questioned default choice. Or at least, when it’s not the choice, C is still the starting point (a gateway drug?) for alternatives such as C++ or Java, which in the end still lack the software engineering benefits that Ada embodies..

Throughout AdaCore’s twenty-five year history, we’ve seen underground activities of software engineers willing to question the status quo and embark on new technological grounds. But driving such a change is a tough sell. While the merits of the language are usually relatively easy to establish, overcoming the surrounding inertia often feels like an insurmountable obstacle. Other engineers have to be willing to change old habits. Management has to be willing to invest in new technology. All have to agree on the need for safer, more secure and more reliable software. Even if we’ve been able to report some successes over the years, we were falling short of the critical mass.

Or so it seemed.

The tide has turned. 2018 and 2019 have been exceptional vintages in terms of Ada and SPARK adoption, all the signs are showing that 2020 will be at least as exciting. What’s more - the new adopters are coming from industries that were never part of the initial Ada and SPARK user base. What used to be inertia is now momentum. Let’s take a look at the information that can be gathered from the web over the past two years to demonstrate the new dynamic of Ada and SPARK usage.

The Established User Base

Due to the nature of the domain, it is difficult to communicate specifically about these projects, and we only have scarce news. One measure of the increasing interest in Ada and SPARK can be inferred from defense-driven research projects which contain references to these language technologies. The most notable example is the recent UK-funded HICLASS project, focused on security, which involves a large portion of the UK defense industry. Some press releases are also available, in particular in the space domain (European Space Agency, AVIO and MDA). These data samples are representative of a very active and vibrant community which is committed to Ada and SPARK for decades to come - effectively guaranteeing their industrial future as far as we can reasonably guess.

The so-called “established user base” has fueled the Ada and SPARK community up until roughly the mid 2010s. At that point of time, a new trend started to emerge, from users and use cases that we’had never seen before. While each case is a story in its own right, some common patterns have emerged. The starting point is almost always either the increase of safety or security requirements, or a wish to reduce the costs of development of an application with some kind of high reliability needs. This is connected to the acknowledgement that the programming language in use - almost exclusively C or C++ - may not be the optimal language to reach these goals. This is well documented in the industry; C and C++ flaws have been the subject of countless papers, and the source of catastrophic vulnerability exploits and tools to work around issues. The technical merits of Ada and its ability to prevent many of these issues is also well documented - we even have access to some measurements of the effects. The most recent one is an independent study developed by VDC, which measured up to 38% cost-savings on Ada vs C in the context of high-integrity markets that have adopted Ada for a long time.

We’re talking a lot about Ada here, but in fact new adopters are typically driven by a mix of SPARK and Ada. The promise that SPARK offers is automatic verification of software properties such as absence of buffer overflow, together with stringent mitigation of others - and this by design, early in the development process. This means that developers are able to self-check their code - not only is the code more reliable, it is also more reliable straight out as you write it, avoiding many mistakes that could otherwise pass through testing, integration or deployment phases.

Some of the SPARK adopters motivated by these benefits come from academia. Over the past 2 years, over 40 universities have joined the GNAT Academic Program (“GAP”), with a mix of teaching and research activities, including for example FH Campus Wien train project, CubeSat and UPMSat-2.

Many adopters can also be found in industry. Some of the following references highlight teams at the research phase, some others represent projects already deployed. They all however contribute to this solid wave of new Ada and SPARK adopters. The publications referenced in the following paragraphs have been published between 2018 and 2019.

One obvious application for Ada and SPARK, where human lives are at risk, is the medical device domain. So it comes without surprise that this area is amongst those adopting the technology. Two interesting cases come to mind. The first one in RealHeart, a Scandinavian manufacturer that is developing an artificial heart with on-board software written in Ada and SPARK, who issued a press release and later made an in-depth presentation at SPARK & Frama-C days. The second reference comes from a large medical device corporation, Hillrom, who published a paper explaining the rationale for the selection of SPARK and Ada for development of ECG algorithms.

Another domain is everything that relates to security. The French security agency ANSSI studied various languages to implement a secure USB key and selected SPARK as the best choice. They published a research paper, presentation and source code. Another interesting new application has been implemented by a German company Componolit developing proven communication protocols

Of course, established markets are also at the party. The University of Colorado’s Laboratory for Atmospheric and Space Physics has recently adopted Ada to develop an application for the International Space Station. In the defense domain, the Air Force Research Labs is studying re-writing a drone framework from C++ to SPARK and doing functional proofs, with a public research paper and source code available.

While all of these domains provide interesting adopter stories, the one single domain that has demonstrated the most interest in the recent past is undoubtedly automotive. This is probably coming from the increasing complexity of electronics systems in cars, with applications such as Advanced Driver Assistance Systems (ADAS) and autonomous vehicles. References in this domain ranges from tier 1 suppliers such as Denso or JTEKT as well as OEMs and autonomous vehicle companies like Volvo’s subsidiary Zenuity

And there’s NVIDIA.

In January of this year, we published with NVIDIA a press release and a blog post, followed-up this November by a presentation at our annual Tech Days conference, and an on-line webex (also see the slides for the webex). In many respects, this is a unique tipping point in the history of Ada adoption in terms of impact in a non-A&D domain, touching considerations ranging from security to automotive safety, all under the tight constraints of firmware development. The webex in particular provides a unique dive into the reasons behind the adoption of SPARK and Ada by a company that didn’t have any particular ties to it initially. It also gives key insights on the challenges and costs of such an adoption, together with the benefits already observed. In many respects, this is almost an adoption guide to the technology from a business standpoint.

Wrapping Up

Keep in mind that the above references are only those that are publicly available, which we know about. There are many more projects under the hood, and even more that we’re not even aware of. Everything considered, this is a very exciting time for the Ada and SPARK languages. Stay tuned, we have an array of new stories coming up for the months and years to come!

]]>

What's changed?

In 2019 AdaCore created a UK business unit and embarked on a new and collaborative venture researching and developing advanced UK aerospace systems. This blog introduces the reader to ‘HICLASS’, describes our involvement and explains how participation in this project is aligned with AdaCore’s core values.

Introducing HICLASS

The “High-Integrity, Complex, Large, Software and Electronic Systems” (HICLASS) project was created to enable the delivery of the most complex, software-intensive, safe and cyber-secure systems in the world. HICLASS is a strategic initiative to drive new technologies and best-practice throughout the UK aerospace supply chain, enabling the UK to affordably develop systems for the growing aircraft and avionics market expected over the coming decades. HICLASS includes key prime contractors, system suppliers, software tool vendors and Universities working together to meet the challenges of growing system complexity and size. HICLASS will allow the development of new, complex, intelligent and internet-connected electronic products that are safe and secure from cyber-attack and can be affordably certified.

The HICLASS project is supported by the Aerospace Technology Institute (ATI) Programme, a joint Government and industry investment to maintain and grow the UK’s competitive position in civil aerospace design and manufacture. The programme, delivered through a partnership between the ATI, Department for Business, Energy & Industrial Strategy (BEIS) and Innovate UK, addresses technology, capability and supply chain challenges.

The £32m investment program, led by Rolls-Royce Control Systems, focuses on the UK civil aerospace sector but also has direct engagement with the Defence, Science and Technology Laboratory (DSTL). The collaborative group, comprised of 16 funded partners and 2 unfunded partners, is made up of the following system developers, tool suppliers and academic institutions: AdaCore, Altran, BAE Systems, Callen-Lenz, Cobham, Cocotec, D-Risq, GE Aviation, General Dynamics UK, Leonardo, MBDA, University of Oxford, Rapita Systems, Rolls-Royce, University of Southampton, Thales, Ultra Electronics and University of York. As well as researching and developing advanced aerospace capabilities, the group aims to pool niche skills and build a highly collaborative community based around the enhanced understanding of shared problems. The project is split into 4 main work packages with 2 technology work packages focusing on integrated model based engineering, cyber-secure architectures and mechanisms, high integrity connectivity, networks and data distribution, advanced hardware platforms and smart sensors and advanced software verification capabilities. In addition, a work package will ensure domain exploitation and drive a cross-industry cyber-security regulatory approach for avionics. A final work package will see the development of integrated HICLASS technology demonstrators.

Introducing ASSET

HICLASS also aims to build, promote and manage the Aerospace Software Systems Engineering and Technology (ASSET) partnership. This community is open to all organisations undertaking technical work in aerospace software and systems engineering in the UK and operates in a manner designed to promote sharing, openness and accessibility. Unlike HICLASS, ASSET publications are made under a Creative Commons Licence, and the group operates without any non-disclosure or collaboration agreements.

AdaCore's R&D Work in the UK

Within HICLASS, AdaCore is working with partners across multiple work packages and is also leading a work package titled “SPARK for HICLASS”. This work package will develop and extend multiple SPARK-related technologies in order to satisfy industrial partner’s HICLASS requirements regarding safety and cyber-security.

SPARK is a globally recognised safety and security profile of Ada and a software programming language defined by IEC/ISO 8652:2012. Born out of a UK MOD sponsored research project, the first version of SPARK, based on Ada 83, was initially produced at the University of Southampton. Since then the technology has been progressively extended and refined and the latest version SPARK 2014, based on Ada 2012, is now maintained and developed by AdaCore and Altran in partnership. Due to its rich pedigree, earnt at the forefront of high integrity software assurance, SPARK plays a big part in AdaCore’s safe and secure software development tool offerings. Through focused and collaborative research and development, AdaCore will guide the evolution of multiple SPARK-related technologies towards a level where they are suitable for building demonstrable, safe and secure cyber-physical systems that meet the software implementation and verification requirements of HICLASS developed by UK Plc.

New extensions to the SPARK language, specific to HICLASS systems, will be developed, these will include the verification of cyber-safe systems and auto generated code. There is also a planned maturing of SPARK reusable code modules where AdaCore will be driven by the needs of our partners in providing high assurance reusable SPARK libraries resulting in the reduction of development time and reduced verification costs.

QGen, a qualifiable and tuneable code generation and model verification tool suite for a safe subset of Simulink® and Stateflow® models, is as a game changer in Model Based Software Engineering (MBSE). For HICLASS, AdaCore will place an emphasis on the fusion of SPARK verification capabilities and HICLASS-related emerging MBSE tools, allowing code level verification to be achieved at the model level. The generation of SPARK code, from our QGen tool, as well as various HICLASS partner’s MBSE technologies, will be researched and developed. Collaborative case studies will be performed to assess and measure success. Collaboration is a key critical success factor in meeting this objective; multiple HICLASS partners are developing MBSE tools and SPARK evolution will be achieved in close partnership with them.

The second, and complementary, objective of this work package is to research and develop cyber-secure counter measures and HICLASS verification strategies, namely in the form of compiler hardening and the development of a ‘fuzzing’ capability for Ada/SPARK. HICLASS case studies, produced within proceeding work packages, will be observed to ensure our SPARK work package is aligned with HICLASS specific standards, guidelines and recommendations and to ensure the relevancy of the work package deliverables.

The third objective is for AdaCore, in collaboration with our HICLASS partners, to evaluate QGen, and associated formal approaches, for existing UK aerospace control systems and to make comparisons with existing Simulink code generation processes. In addition, AdaCore will promote processor emulation technology through a collaborative HICLASS case study.

The final objective is to demonstrate the work package technology through the creation of a software stack capable of executing SPARK software on a range of (physical and emulated) target processors suitable for use in HICLASS. The ability to execute code generated from MBSE environments will also be demonstrated.

Committing Investment into the UK

AdaCore has a long history of working with partners within the UK aerospace industry on safety-related, security-related and mission-critical software development projects. Participation in the HICLASS research and development group complemented AdaCore’s commitment to invest within the UK. This four-year research project is also an excellent fit with AdaCore’s core values and its existing and future capabilities. In addition, the creation of a new UK business unit, ‘AdaCore Ltd’, created to rapidly grow into our UK Centre of Excellence, ensures that our existing and future UK aerospace customers will continue to receive the high level of technical expertise and quality products associated with AdaCore.

History has shown that the UK aerospace industry isn’t afraid to be ambitious and has the technological capability to stay at the forefront of this rapidly growing sector. With HICLASS, the sky really is the limit, and AdaCore welcomes the opportunity to be a part of the journey and further extend our partnerships within this technologically advanced and continually growing market.

Further information about the ATI, BEIS and IUK...

Aerospace Technology Institute (ATI)

The Aerospace Technology Institute (ATI) promotes transformative technology in air transport and supports the funding of world-class research and development through the multi-billion pound joint government-industry programme. The ATI stimulates industry-led R&D projects to secure jobs, maintain skills and deliver economic benefits across the UK.

Setting a technology strategy that builds on the UK’s strengths and responds to the challenges faced by the UK civil aerospace sector; ATI provides a roadmap of the innovation necessary to keep the UK competitive in the global aerospace market, and complements the broader strategy for the sector created by the Aerospace Growth Partnership (AGP).

The ATI provides strategic oversight of the R&T pipeline and portfolio. It delivers the strategic assessment of project proposals and provides funding recommendations to BEIS.

Department for Business, Energy and Industrial Strategy (BEIS)

Department for Business, Energy and industrial Strategy (BEIS) is the government department accountable for the ATI Programme. As the budget holder for the programme, BEIS, is accountable for the final decision regarding projects to progress and fund with Government resources, as well as performing Value for Money (VfM) assessment on all project proposals, one of the 3 ATI Programme assessment streams.

Innovate UK (IUK)

Innovate UK is the funding agency for the ATI Programme. It delivers the competitions process including independent assessment of project proposals, and provides funding recommendations to BEIS. Following funding award, Innovate UK manages the programme, from contracting projects, through to completion.

Innovate UK is part of UK Research and Innovation (UKRI), a non-departmental public body funded by a grant-in-aid from the UK government. Innovate UK drives productivity and economic growth by supporting businesses to develop and realise the potential of new ideas, including those from the UK’s world-class research base.

UKRI is the national funding agency investing in science and research in the UK. Operating across the whole of the UK with a combined budget of more than £6 billion, UKRI brings together the 7 Research Councils, Innovate UK and Research England.

]]>

I’ve been telling Ada developers for a while now that Libadalang will open up the possibility of more-easily writing Ada source code analysis tools.  (You can read more about Libadalang here and here and can also access the project on Github.)

Along these lines, I recently had a discussion with a customer about whether there were any tools for detecting uses of access types in their code which got me thinking about possible ways to detect the use of Access Types in a set of Ada source code files.

GNATcheck doesn't currently have a rule that prohibits the use of access types.  Also, SPARK 2014 recently added support for Access Types, whereas previously they were banned.  So earlier versions of GNATprove could detect them quite effectively, the latest and future versions may not.

I decided to architect a solution to this problem and determined there were several implementation options open to me:

1. Use ‘grep’ on a set of Ada sources to find instances of the "access" Ada keyword
2. Use gnat2xml and then use ‘grep’ on the resulting output to search for certain tags
3. Use gnat2xml and then write an XML-aware search utility to search for certain tags

Option 1 and 2 just feel too easy and would defeat the purpose of this blog post.

Option 3 is perhaps a good topic for another post related to using XML/Ada, however I decided to put my money where my mouth is and go with Option 4!

While I wrote this program in Ada,  I could have written it in Python.

So here is the program:

with Ada.Text_IO;         use Ada.Text_IO;

procedure ptrfinder1 is

LAL_CTX  : constant Analysis_Context := Create_Context;

begin

while not End_Of_File(Standard_Input)
loop

declare

Filename : constant String := Get_Line;

Unit : constant Analysis_Unit := LAL_CTX.Get_From_File(Filename);

function Process_Node(Node : Ada_Node'Class) return Visit_Status is
begin

then
Put_Line(
Source => Filename & ":" & Node.Sloc_Range.Start_Line'Img,
)
);
end if;

return Into;

end Process_Node;

begin

if not Unit.Has_Diagnostics then
Unit.Root.Traverse(Process_Node'Access);
end if;

end ptrfinder1;

I designed the program to read a series of fully qualified absolute filenames from standard input and process each of them in turn.  This approach made the program much easier to write and test and,  as you'll see, allowed the program to be integrated effectively with other tools.

Let's deconstruct the code a little....

For each provided filename,  the program creates a Libadalang Analysis_Unit for that filename.

Read_Standard_Input:
while not End_Of_File(Standard_Input)
loop

declare

Filename : constant String := Get_Line;

Unit : constant Analysis_Unit :=
LAL_CTX.Get_From_File(Filename);

As long as it has no issues,  the Ada unit is traversed and the Process_Node subprogram is executed for each detected node.

if not Unit.Has_Diagnostics then
Unit.Root.Traverse(Process_Node'Access);
end if;

The Process_Node subprogram checks the Kind field of the detected Ada_Node'Class parameter to see if it is any of the access type related nodes.  If so,  the program outputs the fully qualified filename, a ":" delimiter, and the line number of the detected node.

function Process_Node(Node : Ada_Node'Class) return Visit_Status is
begin

then
Put_Line(
Source => Filename & ":" & Node.Sloc_Range.Start_Line'Img,
)
);
end if;

return Into;

end Process_Node;

At the end of the Process_Node subprogram,  the returned value allows the traversal to continue.

To make the program a more useful tool within a development environment based on GNAT Pro,  I integrated it with the piped output of the 'gprls' program.

gprls is a tool that outputs information about compiled sources. It gives the relationship between objects, unit names, and source files. It can also be used to check source dependencies as well as various other characteristics.

My program can then be invoked as part of a more complex command line:

$gprls -s -P test.gpr | ./ptrfinder1 Given the following content of test.gpr: project Test is For Languages use ("Ada"); for Source_Dirs use ("."); for Object_Dir use "obj"; end Test; Plus an Ada source code file called inc_ptr1.adb (in the same directory as test.gpr) containing the following: procedure Inc_Ptr1 is type Ptr is access all Integer; begin null; end Inc_Ptr1; The resulting output from the integration of gprls and my program is: /home/pike/Workspace/access-detector/test/inc_ptr1.adb:3 This output correctly identified the access type usage on line 3 of inc_ptr1.adb. But how do I know that my program or indeed Libadalang has functioned correctly? I decided to stick in principle to the UNIX philosophy of "Do One Thing and Do it Well" and write a second program to verify the output of my first program using a simple algorithm. This second program is given a filename and line number and verifies that the keyword "access" appears on the specified line number. Of course, I could also have embedded this verification into the first program, but to illustrate a point about diversity I chose not to. with Ada.Text_IO; use Ada.Text_IO; with Ada.Directories; use Ada.Directories; with Ada.Strings; use Ada.Strings; with Ada.Strings.Fixed; use Ada.Strings.Fixed; with Ada.IO_Exceptions; procedure ptrfinder2 is begin Read_Standard_Input: while not End_Of_File(Standard_Input) loop Process_Standard_Input: declare Std_Input : constant String := Get_Line; Delimeter_Position : constant Natural := Index(Std_Input,":"); Line_Number_As_String : constant String := Std_Input(Delimeter_Position+1..Std_Input'Last); Line_Number : constant Integer := Integer'Value(Line_Number_As_String); Filename : constant String := Std_Input(Std_Input'First..Delimeter_Position-1); The_File : File_Type; Verified : Boolean := False; begin if Ada.Directories.Exists(Filename) and then Line_Number > 1 then Open(File => The_File, Mode => In_File, Name => Filename); Locate_Line: for I in 1..Line_Number loop Verified := Index(Get_Line(The_File)," access ") > 0; exit when Verified or else End_Of_File(The_File); end loop Locate_Line; Close(File => The_File); end if; if Verified then Put_Line("Access Type Verified on line #" & Line_Number_As_String & " of " & Filename); else Put_Line("Suspected Access Type *NOT* Verified on line #" & Line_Number_As_String & " of " & Filename); end if; end Process_Standard_Input; end loop Read_Standard_Input; end ptrfinder2; I can then string the first and second program together: $ gprls -s -P test.gpr | ./ptrfinder1 | ./ptrfinder2

This produces the output:

Access Type Verified on line #3 of /home/pike/Workspace/access-detector/test/inc_ptr1.adb

It goes without saying that a set of Ada sources with no Access Type usage will result in no output from either the first or second program.

This expedition into Libadalang has reminded me how extremely effective Ada can be at writing software development tools.

The two programs described in this blog post were built and tested on 64-bit Ubuntu 19.10 using GNAT Pro and Libadalang.  They are also known to build successfully with the 64-bit Linux version of GNAT Community 2019.

]]>
RecordFlux: From Message Specifications to SPARK Code https://blog.adacore.com/recordflux-from-message-specifications-to-spark-code Thu, 17 Oct 2019 13:08:23 +0000 Alexander Senier https://blog.adacore.com/recordflux-from-message-specifications-to-spark-code

Software constantly needs to interact with its environment. It may read data from sensors, receive requests from other software  components or control hardware devices based on the calculations performed. While this interaction is what makes software useful in the first place, processing messages from untrusted sources inevitably creates an attack vector an adversary may use to exploit software vulnerabilities. The infamous Heartbleed is only one example where a security critical software was attacked by specially crafted messages.

Implementing those interfaces to the outside world in SPARK and proving the absence of runtime errors is a way to prevent such attacks. Unfortunately, manually implementing and proving message parsers is a tedious task which needs to be redone for every new protocol. In this article we'll discuss the challenges that arise when creating provable message parsers and present RecordFlux, a framework which greatly simplifies this task.

Specifying Binary Messages

Ethernet: A seemingly simple example

At a first glance, Ethernet has a simple structure: A 6 bytes destination field, a 6 bytes source field and a 2 bytes type field followed by the payload:

We could try to model an Ethernet frame as a simple record in SPARK:

package Ethernet is

type Byte is mod 2**8;
type Type_Length is mod 2**16;
type Payload is array (1..1500) of Byte;

type Ethernet_Frame is
record
EtherType   : Type_Length;
end record;

end Ethernet;

When looking closer, we realize that this solution is a bit short-sighted. Firstly, defining the payload as a fixed-size array as above will either waste memory when handling a lot of small (say, 64 bytes) frames or be too short when handling jumbo frames which exceed 1500 bytes. More importantly, the Ethernet header is not as simple as we pretended earlier. Looking at the standard, we realize that the EtherType field actually has a more complicated semantics to allow different frame types to coexist on the same medium.

If the value of EtherType is greater or equal to 1536, then it contains an Ethernet II frame. The EtherType is treated as a type field which determines the protocol contained in Data. In that case, the Data field uses up the remainder of the Ethernet frame. If the value of EtherType is less or equal to 1500, then it contains a IEEE 802.3 frame and the EtherType field represents the length of the Data field. Frames with an EtherType value between 1501 and 1535 are considered invalid.

To make things even worse, both variants may contain an optional IEEE 802.1Q tag to identify the frames priority and VLAN. The tag is inserted after the source field and itself is comprised of two 16 bit fields, TPID and TCI. It is present if the bytes that would contain the TPID field have a hexadecimal value of 8100. Otherwise these bytes contain the EtherType field.

Lastly, the Data field will usually contain higher-level protocols. Which protocol is contained and how the payload is to be interpreted depends on the value of EtherType. With our naive approach above, we have to manually convert Data into the correct structured message format. Without tool support, this conversion will be another source of errors.

Formal Specification with RecordFlux

Next, we'll specify the Ethernet frame format using the RecordFlux domain specific language and demonstrate how the specification is used to create real-world parsers. RecordFlux, deliberately has a syntax similar to SPARK, but deviates where more expressiveness was required to specify complex message formats.

Packages and Types

Just like SPARK, entities are grouped into packages. By convention, a package contains one protocol like IPv4, UDP or TLS. A protocol will typically define many types and different message formats. Range types and modular types are identical to those found in SPARK. Just like in SPARK, entities can be qualified using aspects, e.g. to specify the size of a type using the Size aspect:

package Ethernet is

type Type_Length is range 46 .. 2**16 - 1 with Size => 16;
type TPID is range 16#8100# .. 16#8100# with Size => 16;
type TCI is mod 2**16;

end Ethernet;

The first difference to SPARK is the message keyword which is similar to records, but has important differences to support the non-linear structure of messages. The equivalent of the naive Ethernet specification in RecordFlux syntax would be:

type Simplified_Frame is
message
Type_Length : Type_Length;
end message;

Graph Structure

As argued above, such a simple specification is insufficient to express the complex corner-cases found in Ethernet. Luckily, RecordFlux messages allow for expressing conditional, non-linear field layouts. While SPARK records are linear sequences of fixed-size fields, messages should rather be thought of as a graph of fields where the next field, its start position, its length and constraints imposed by the message format can be specified in terms of other message fields. To ensure that the parser generated by RecordFlux is deadlock-free and able to parse messages sequentially, conditionals must only reference preceding fields.

We can extend our simple example above to express the relation of the value of Type_Length and length of the payload field:

   ...
Type_Length : Type_Length
then Data
with Length => Type_Length * 8
if Type_Length <= 1500,
then Data
with Length => Message’Last - Type_Length’Last
if Type_Length >= 1536;
...

For a field, the optional then keyword defines the field to follow in the graph. If that keyword is missing, this defaults to the next field appearing in the message specification as in our Simplified_Frame example above. To have different fields follow under different conditions, an expression can be added using the if keyword. Furthermore, an aspect can be added using the with keyword, which can be used to conditionally alter properties like start or length of a field. If no succeeding field is specified for a condition, as for Type_Length in the range between 1501 and 1535, the message is considered invalid.

In the fragment above, we use the value of the field Type_Length as the length of the Data field if its value is less or equal to 1500 (IEEE 802.3 case). If Type_Length is greater or equal to 1536, we calculate the payload length by subtracting the end of the Type_Length field from the end of the message. The 'Last (and also 'First and 'Length) attribute is similar to the respective SPARK attribute, but refer to the bit position (or bit length) of a field within the message. The Message field is special and refers to the whole message being handled.

Optional Fields

The graph structure described above can also be used to handle optional fields, as for the IEEE 802.1Q tag in Ethernet. Let's have a look at the full Ethernet specification first:

package Ethernet is
type Type_Length is range 46 .. 2**16 - 1 with Size => 16;
type TPID is range 16#8100# .. 16#8100# with Size => 16;
type TCI is mod 2**16;

type Frame is
message
Type_Length_TPID : Type_Length
then TPID
with First => Type_Length_TPID’First
if Type_Length_TPID = 16#8100#,
then Type_Length
with First => Type_Length_TPID’First
if Type_Length_TPID /= 16#8100#;
TPID : TPID;
TCI : TCI;
Type_Length : Type_Length
then Data
with Length => Type_Length * 8
if Type_Length <= 1500,
then Data
with Length => Message’Last - Type_Length’Last
if Type_Length >= 1536;
then null
if Data’Length / 8 >= 46
and Data’Length / 8 <= 1500;
end message;
end Ethernet;

Most concepts should look familiar by now. The null field used in the then expression of the Data field is just a way to state that the end of the message is expected. This way, we are able to express that the payload length must be between 46 and 1500. As there is only one then branch for payload (pointing to the end of the message), values outside this range will be considered invalid. This is the general pattern to express invariants that have to hold for a message.

How can this be used to model optional fields of a message? We just need to cleverly craft the conditions and overlay the following alternatives. The relevant section of the above Ethernet specification is the following:

   ...
Type_Length_TPID : Type_Length
then TPID
with First => Type_Length_TPID’First
if Type_Length_TPID = 16#8100#,
then Type_Length
with First => Type_Length_TPID’First
if Type_Length_TPID /= 16#8100#;
TPID : TPID;
TCI : TCI;
Type_Length : Type_Length
...

Remember that the optional IEEE 802.1Q tag consisting of the TPID and TCI fields is present after the Source only if the bytes that would contain the TPID field are equal to a hexadecimal value of 8100. We introduce a field Type_Length_TPID only for the purpose of checking whether this is the case. To avoid any confusion when using the parser later, we will overlay this field with properly named fields. If Type_Length_TPID equals 16#8100# (SPARK-style numerals are supported in RecordFlux), we define the next field to be TPID and set its first bit to the 'First attribute of the Type_Length_TPID field. If Type_Length_TPID does not equal 16#8100#, the next field is Type_Length, skipping TPID and TCI.

As stated above, the specification actually is a graph with conditions on its edges. Here is an equivalent graph representation of the full Ethernet specification:

Working with RecordFlux

RecordFlux comes with the command line tool rflx which parses specification files, transforms them into an internal representation and subsequently generates SPARK packages that can be used to parse the specified messages:

To validate specification files, which conventionally have the file ending .rflx, run RecordFlux in check mode. Note, that RecordFlux does not support search paths at the moment and all files need to be passed on the command line:

$rflx check ethernet.rflx Parsing ethernet.rflx... OK Code Generation Code is generated with the generate subcommand which expects a list of RecordFlux specifications and an output directory for the generated SPARK sources. Optionally, a root package can be specified for the generated code using the -p switch: $ rflx generate -p Example ethernet.rflx outdir
Parsing ethernet.rflx... OK
Generating... OK
Created outdir/example-scalar_sequence.adb

Usage

To use the generated code, we need to implement a simple main program. It allocates a buffer to hold the received Ethernet packet and initializes the parser context with the pointer to that buffer. After verifying the message and checking whether it has the correct format, its content can be processed by the application.

with Ada.Text_IO; use Ada.Text_IO;
with Example.Ethernet.Frame;
with Example.Types;

procedure Ethernet
is
package Eth renames Example.Ethernet.Frame;
subtype Packet is Example.Types.Bytes (1 .. 1500);
Buffer : Example.Types.Bytes_Ptr := new Packet'(Packet'Range => 0);
Ctx    : Eth.Context := Eth.Create;
begin
-- Retrieve the packet
Buffer (1 .. 56) :=
(16#ff#, 16#ff#, 16#ff#, 16#ff#, 16#ff#, 16#ff#, 16#04#, 16#d3#,
16#b0#, 16#ab#, 16#f9#, 16#31#, 16#08#, 16#06#, 16#00#, 16#01#,
16#08#, 16#00#, 16#06#, 16#04#, 16#00#, 16#01#, 16#04#, 16#d3#,
16#b0#, 16#9c#, 16#79#, 16#53#, 16#ac#, 16#12#, 16#fe#, 16#b5#,
16#00#, 16#00#, 16#00#, 16#00#, 16#00#, 16#00#, 16#ac#, 16#12#,
16#64#, 16#6d#, 16#00#, 16#00#, 16#00#, 16#00#, 16#00#, 16#00#,
16#00#, 16#00#, 16#00#, 16#00#, 16#00#, 16#00#, 16#00#, 16#00#);

Eth.Initialize (Ctx, Buffer);
Eth.Verify_Message (Ctx);
if Eth.Structural_Valid_Message (Ctx) then
Put_Line ("Source: " & Eth.Get_Source (Ctx)'Img);
if Eth.Present (Ctx, Eth.F_TCI) and then Eth.Valid (Ctx, Eth.F_TCI)
then
Put_Line ("TCI: " & Eth.Get_TCI (Ctx)'Img);
end if;
end if;
end Ethernet;

The code can be proven directly using gnatprove (the remaining warning is a known issue in GNAT Community 2019 as explained in the Known Issues section of the README):

$gnatprove -P ethernet.gpr Phase 1 of 2: generation of Global contracts ... Phase 2 of 2: flow analysis and proof ... ethernet.adb:22:25: warning: unused assignment to "Buffer" gnatprove: error during flow analysis and proof What guarantees can we obtain from this proof? Firstly, the absence of runtime errors is shown for the generated code as well as for the user code. No matter what input is read into the Buffer variable and presented to the parser, the program does not raise an exception at runtime. Furthermore, its control flow cannot be circumvented e.g. by buffer overflows or integer overflows. This is called "silver level" in SPARK parlance. Additionally, we prove key integrity properties ("gold level"), e.g. that optional fields are accessed if and only if all requirements defined in the RecordFlux specification are met. Case Studies RecordFlux greatly eases the specification and handling of binary messages. But is it suitable for real-world applications? We conducted a number of case studies to validate that it in fact is. Packet Sniffer Packet sniffers are tools often used by administrators to diagnose network problems. They capture and dissect all data received on a network interface to allow for a structured analysis of packet content. Famous open source examples are Wireshark and tcpdump. As packet sniffers need to handle a large amount of complex protocols, they also tend to be complex. There have been errors which allowed attackers to mount remote exploits against packet sniffers. For this reason, e.g. it is discouraged to run Wireshark under a privileged user account. Obviously a formally verified packet sniffer is desirable to eliminate the risk of an attack when analyzing traffic from untrusted sources. We prototyped a very simple packet sniffer for IP/UDP on Linux for which we proved the absence of runtime errors. The output is similar to other packet sniffers: $ sudo ./obj/sniff_udp_in_ip

IP: Version: 4 IHL: 5 DSCP: 0 ECN: 0 TLen: 53 Id: 9059 DF: 1 MF: 0
FOff: 0 TTL: 64 Proto: UDP HCSum: 6483 Src: 127.0.0.1 Dst: 127.0.0.1,
UDP: SPort: 58423 DPort: 53 Len: 33 CSum: 65076 Payload: b9 7d 01
00 00 01 00 00 00 00 00 00 04 63 6f 61 70 02 6d 65 00 00 1c 00 01

TLS 1.3

Another area where correct parsers are essential are security protocols. The most important security protocol on the internet is TLS - whenever a browser connects to a remote server using the https protocol, in the background some version of TLS is used. We formalized the message format of the latest TLS version 1.3 according to RFC 8446 and generated a SPARK parser for TLS from the specification.

An open question remained, though: Can the generated code handle real-world TLS traffic and is there a performance penalty compared to unverified implementations? While we are working on a verified component-based TLS 1.3 implementation completely done in SPARK, it is not yet available for this experiment. As an alternative, we used an open source TLS 1.3 library by Facebook named Fizz and replaced its TLS parser by our generated code. As the C++ types used by Fizz (e.g. vectors) could not easily be bound to Ada, glue code had to be written manually to translate between the C++ and the SPARK world. We ensured that all untrusted data the library comes in touch with is handled by SPARK code. For the SPARK part, we proved the absence of runtime errors and the invariants stated in the specification.

Our constructive approach turned out to be effective for improving the security of existing software. In CVE-2019-3560 an integer overflow was found in the Fizz library. Just by sending a short, specially crafted sequence of messages, an attacker could mount a Denial of Service attack against an application using Fizz by putting it into an infinite loop. While Facebook fixed this bug by using a bigger integer type, the RecordFlux parser prevents this issue by rejecting packets with invalid length fields.

Despite the required transformations, the performance overhead was surprisingly low. For the TLS handshake layer - the part that negotiates cryptographic keys when the communication starts - the throughput was 2.7% lower than for the original version. For the TLS record layer, which encrypts and decrypts packets when a connection is active, the throughput was only 1.1% lower:

Conclusion and Further Information

With RecordFlux, creating SPARK code that handles complex binary data has become a lot easier. With a proper specification the generated code can often be proven automatically. In the future, we will extend RecordFlux to support the generation of binary messages and the modeling of the dynamic behavior of binary protocols.

For more information see our research paper and the language reference. If you have comments, found a bug or have suggestions for enhancements, feel free to open an issue on GitHub or write an email.

]]>
The Power of Technology Integration and Open Source https://blog.adacore.com/the-power-of-technology-integration Tue, 15 Oct 2019 12:04:00 +0000 Arnaud Charlet https://blog.adacore.com/the-power-of-technology-integration

Part of our core expertise at AdaCore is to integrate multiple technologies as smoothly as possible and make it a product. This started at the very beginning of our company by integrating a code generator (GCC) with an Ada front-end (GNAT) which was then followed by integrating a debugger engine (GDB) and led to today's rich GNAT Pro offering.

Today we are going much further in this area and I am going to give you a few concrete examples in this post.

For example take our advanced static analysis engine CodePeer and let's look at it from two different angles (a bit like bottom-up and top-down if you will): what does it integrate, and what other products integrate it?

From the first perspective, CodePeer integrates many different and complex pieces of technology: the GNAT front-end, various "core" GNAT tools (including GNATmetric, GNATcheck, GNATpp), the AdaCore IDEs (GNAT Studio, GNATbench), GNATdashboard, a Jenkins plug-in, GPRbuild, as well as the codepeer engine itself and finally as of version 20, libadalang and light checkers based on libadalang (aka "LAL checkers").

Thanks to this complex integration, CodePeer users can launch various tools automatically and get findings stored in a common database, and get a common user interface to drive it. Indeed, from CodePeer you can get access to: GNAT warnings, GNATcheck messages, LAL checkers, CodePeer "advanced static analysis" messages.

And the list will continue growing in the future, but that's for another set of posts!

Now from the other perspective, the codepeer engine is also integrated as an additional prover in our SPARK Pro product, to complement the SMT solvers integrated in SPARK, and is also used in our QGen product as the back-end of the QGen Verifier which performs static analysis on Simulink models.

Speaking of SPARK, this is also another good example of complex integration of many different technologies: the GNAT front-end, GPRbuild, GNAT Studio and GNATbench, GNATdashboard, Why3, CVC4, Z3, Alt-Ergo, the codepeer engine, and the SPARK engine itself.

Many of these components are developed in house at AdaCore, and many other components are developed primarily outside AdaCore (such as GCC, GDB, Why3, CVC4, ...). Such complex integration is only possible because all these components are Open Source and precisely allow this kind of combination. Add on top of that AdaCore's expertise in such integration and productization, and you get our ever growing offering!

]]>
Learning SPARK via Conway's Game of Life https://blog.adacore.com/learning-spark Thu, 10 Oct 2019 11:42:21 +0000 Michael Frank https://blog.adacore.com/learning-spark

I began programming in Ada in 1990 but, like many users, it took me a while to become a fan of the language. A brief interlude with Pascal in the early 90’s gave me a better appreciation of the strong-typing aspects of Ada, but continued usage helped me learn many of the other selling points.

I’ve recently started working at AdaCore as a Mentor, which means part of my job is to help companies transition from other languages to Ada and/or SPARK. At my last job, the development environment was mixed C++ and Ada (on multiple platforms) so, to simplify things, we were coding in Ada95. Some of our customers’ code involved Ada2005 and Ada2012 constructs, so I did learn the new language variants and see the benefits, but I hadn’t actually written anything using contracts or expression functions or the like.

We all know the best way to learn is to do, so I needed to come up with something that would require me to increase my knowledge not only of the Ada language, but also to get some more experience with the AdaCore tools (GPS, CodePeer) as well as learning how to prove the correctness of my SPARK code. So I settled on Conway’s Game of Life (Wikipedia). This is an algorithmic model of cell growth on a 2-dimensional grid following four simple rules about cell life and death:

• A live cell with fewer than two live neighbors dies (underpopulation)
• A live cell with two or three live neighbors stays alive (life)
• A live cell with more than three live neighbors dies (overpopulation)
• A dead cell with exactly three live neighbors becomes a live cell (birth)

This little application gave me everything I needed to experiment with, and even a little more. Obviously, this was going to be written in Ada following SPARK rules, so I could program using the latest language variant. The development environment was going to be AdaCore’s GPS Integrated Development environment, and I could use CodePeer for static analysis on the code I was writing, so I could get some experience with these tools. I then could use the SPARK provers to prove that my code was correct. In addition, as Life has a visual aspect, I was able to do a little work with GtkAda to make a graphical representation of the algorithm.

I started with the lower level utilities needed to perform the algorithm – counting the number of live neighbors. Easy, right? Yes, it is easy to code, but not so easy to prove using the SPARK provers. If I wasn’t worried about provability, I could just write a simple doubly-nested loop that would iterate over rows and columns around the target cell, and count how many neighboring cells are alive.

The issue is that, with loops in SPARK, we need to “remind” SPARK what has happened in previous loop iterations (typically using “pragma Loop_Invariant”) so it has all the knowledge it needs to prove the current iteration. This issue is made more difficult when we use nested loops. So, I rewrote my “count all neighbors” routine into two routines - “count neighbors in a row” and “total neighbors in all rows”. Each routine has a single loop, which made specifying the loop invariant much easier.

And that “rewriting” helped reinforce my understanding of “good coding practices”. Most of us were taught that high-complexity subroutines were problematic because they were difficult to test. So we write code that tries to keep the complexity down. The next level in safety-critical software – provability – is also made more difficult by higher complexity subroutines. In my case, my counting routines needed to check to see if a cell was alive and, if so, increment the count. That’s just a simple “if” statement. But, to make it easier on the prover, I created a new routine that would return 1 if the cell was alive and 0 otherwise. Very simple, and therefore very easy to prove correctness. Now, my counting routine could just iterate over the cells, and sum the calls to the function – once again, easier to prove.

Once the low-level functions were created the “real” coding began. And in trying to write pre- and post-conditions for higher level algorithms, I found a great help in the CodePeer tool. This tool performs static analysis on the code base, from simple and quick to exhaustive and slow. The correlation between this tool and the prover tools is that, if I was having a problem writing a valid postcondition, more often than not a CodePeer report would show some warning indicating my preconditions were not complete. For example, every time you add numbers, CodePeer would remind you that a precondition would have to be set to ensure no overflow. Without that warning, everything looks fine, but the prover will not be able to validate the routine. In addition, when using GPS, CodePeer will insert annotations into the Source Code View detailing many of the aspects of each subprogram. These annotations typically include pre- and post-conditions, making it easier to determine which of these aspects could be written into the code to make proving easier.

So, a development cycle was created. First, write a subprogram to “do something” (usually this means writing other subprograms to help!) Next, run CodePeer to perform static analysis. In addition to finding run-time errors, the annotations would help determine pre-conditions for the routines. I would study these annotations, and, for each pre- and post-condition, I had to determine if it made sense or not. If the condition did not make sense, I needed to modify my subprogram to get the correct results. If the condition made sense, then I needed to decide if I wanted to encode that condition using aspects, or just ignore – because sometimes the annotations were more detailed than I needed. Once I got past CodePeer analysis, I would try to prove the subprograms I created. If they could be proven, I could go onto the next step in building my application. If not, I needed to modify my code and start the cycle over.

This process is basically a one-person Code Review! In a large software project, these steps would be the same on some portion of the code, and only then would the code go into a peer review process – with the benefit that the reviewers are only concerned with design issues, and don’t have to be worried about dealing with run-time errors or verifying that the implementation was correct. So, not only did I achieve my objective of learning aspects of the languages and the tools, I got first-hand evidence of the benefits of writing software “the right way!”

To access the source code for this project, please checkout my github repository here: https://github.com/frank-at-adacore/game_of_life

]]>
Pointer Based Data-Structures in SPARK https://blog.adacore.com/pointer-based-data-structures-in-spark Tue, 08 Oct 2019 12:19:34 +0000 Claire Dross https://blog.adacore.com/pointer-based-data-structures-in-spark

As seen in a previous post, it is possible to use pointers (or access types) in SPARK provided the program abides by a strict memory ownership policy designed to prevent aliasing. In this post, we will demonstrate how to define pointer-based data-structures in SPARK and how to traverse them without breaking the ownership policy.

Pointer-based data structures can be defined in SPARK as long as they do not introduce cycles. For example, singly-linked lists and trees are supported whereas doubly-linked lists are not. As an example for this post, I have chosen to use a map encoded as a singly-linked list of pairs of a key and a value. To define a recursive data structure in Ada, you need to use an access type. This is what I did here: I declared an incomplete view of the type Map, and then an access type Map_Acc designating this view. Then, I could give the complete view of Map as a record with a component Next of type Map_Acc:

   type Map;
type Map_Acc is access Map;
type Element_Acc is not null access Element;
type Map is record
Key   : Positive;
Value : Element_Acc;
Next  : Map_Acc;
end record;

Note that I also used an access type for the element that is stored in the map (and not for the key). Indeed, we assume here that the elements stored in the map can be big, so that we may want to modify them in place rather than copying them. We will see later how this can be done.

To be able to write specifications on my maps, I need to define some basic properties for them. To describe precisely my maps, I will need to speak about:

• whether there is a mapping for a given key in my map, and
• the value associated to a key by the map, knowing that there is a mapping for this key.

These concepts are encoded as the following functions:

   function Model_Contains (M : access constant Map; K : Positive) return Boolean;

function Model_Value (M : access constant Map; K : Positive) return Element
with  Pre => Model_Contains (M, K);

Model_Contains should return True when there is an occurrence of the key K in the map M, and Model_Value should retrieve the element associated with the first occurrence of K in M.

Now, I need to give a definition to my functions. The natural way to express the meaning of properties on linked data-structures is through recursion. This is what I have done here for both Model_Contains and Model_Value:

   function Model_Contains (M : access constant Map; K : Positive) return Boolean is
(M /= null and then (M.Key = K or else Model_Contains (M.Next, K)));
--  A key K has a mapping in a map M if either K is the key of the first mapping of M or if
--  K has a mapping in M.Next.

function Model_Value (M : access constant Map; K : Positive) return Element is
(if M.Key = K then M.Value.all else Model_Value (M.Next, K));
--  The value mapped to key K by a map M is either the value of the first mapping of M if
--  K is the key this mapping or the value mapped to K in M.Next otherwise.

Note that we do not need to care about cases where there is no mapping for K in M in the definition of Model_Value, as we have put as a precondition that it cannot be used in such cases.

As these functions are recursive, GNATprove cannot determine that they will terminate. As a result, it will not use their definition in some cases, lest they may be incorrect. To avoid this problem, I have added Terminating annotations to their declarations. These annotations cannot be verified by the tool, but they can easily be discharged by review, so I accepted the check messages with appropriate justification. I have also marked the two functions as ghost, as I will assume that, for efficiency reasons or to control the stack usage, we don't want to use recursive functions in normal code:

   function Model_Contains (M : access constant Map; K : Positive) return Boolean
with
Ghost,
Annotate => (GNATprove, Terminating);
pragma Annotate
(GNATprove, False_Positive,
"subprogram ""Model_Contains"" might not terminate",
"Recursive calls occur on strictly smaller structure");

function Model_Value (M : access constant Map; K : Positive) return Element
with
Ghost,
Annotate => (GNATprove, Terminating),
Pre => Model_Contains (M, K);
pragma Annotate
(GNATprove, False_Positive,
"subprogram ""Model_Value"" might not terminate",
"Recursive calls occur on strictly smaller structure");

Now that we have defined our specification properties for maps, let's try to implement some functionality. We will start with the easiest one, a Contains function which checks whether there is mapping for a given key in a map. In terms of functionality, it should really return the same value as Model_Contains. However, implementation-wise, we would like to use a loop to avoid introducing recursion. Here, we need to be careful. Indeed, traversing a linked data-structure using a loop generally involves an alias between the traversed structure and the pointer used for the traversal. SPARK supports this use case through the concepts of local observers and borrowers (borrowed from Rust's regular and mutable borrows). What we need here is an observer. Intuitively, it is a local variable which is granted for the duration of its lifetime read-only access to a component of another object. When the observer stays in scope, the actual owner of the object also has read-only access on the object. When the observed goes out of scope, it regains complete ownership (provided it used to have it, and there are no other observers in scope). Let us see how we can use it in our example:

   function Contains (M : access constant Map; K : Positive) return Boolean with
Post => Contains'Result = Model_Contains (M, K);

function Contains (M : access constant Map; K : Positive) return Boolean is
C : access constant Map := M;
begin
while C /= null loop
pragma Loop_Invariant (Model_Contains (C, K) = Model_Contains (M, K));
if C.Key = K then
return True;
end if;
C := C.Next;
end loop;
return False;
end Contains;

Because it is declared with an anonymous access-to-constant type, the assignment to C at its declaration is not considered to be a move, but an observe. In the body of Contains, C will be a read-only alias of M. M itself will retain read-only access to the data it designates, and will regain full ownership at the end of Contains (note that here, it does not change much, as M itself is an access-to-constant type, so it has read-only access to its designated data to begin with).

In the body of Contains, we use a loop which searches for a key equal to K in M. We see that, even though C is an observer, we can still assign to it. It is because it is the data designated by C which is read-only, and not C itself. When we assign directly into C, we do not modify the underlying structure, we simply modify the handle. Note that, as usual with loops in SPARK, I had to provide a loop invariant to help GNATprove verify my code. Here it states that K is contained in M, if and only if, it is contained in C. This is true because we have only traversed values different from K until now. We see that, in the invariant, we are allowed to mention M. This is because M retains read-only access to the data it designated during the observe.

Let's now write a function to retrieve in normal code the value associated to a key in a map. Since elements can be big, we don't want to copy them, so we should not return the actual value, but rather a pointer to the object stored in the map. For now, let's assume that we are interested in a read-only access to the element. As we have seen above, in the ownership model of SPARK, a read-only access inside an existing data-structure is an observer. So here, we want a function which computes and returns an observer of the input data-structure. This is supported in SPARK, provided the traversed data-structure is itself an access type, using what we call "traversal functions". An observing traversal function takes an access type as its first parameter and returns an anonymous access-to-constant object which should be a component of this first parameter.

   function Constant_Access (M : access constant Map; K : Positive) return not null access constant Element
with
Pre  => Model_Contains (M, K),
Post => Model_Value (M, K) = Constant_Access'Result.all;

function Constant_Access (M : access constant Map; K : Positive) return not null access constant Element
is
C : access constant Map := M;
begin
while C /= null loop
pragma Loop_Invariant (Model_Contains (C, K) = Model_Contains (M, K));
pragma Loop_Invariant (Model_Value (C, K) = Model_Value (M, K));
if C.Key = K then
return C.Value;
end if;
C := C.Next;
end loop;
raise Program_Error;
end Constant_Access;

The return type of Constant_Access is rather verbose. It states that it computes an anonymous access-to-constant object (an observer in SPARK) which is not allowed to be null. Indeed, since we know that we have a mapping for K in M when calling Constant_Access, we are sure that there will always be an element to return. This also explains why we have a raise statement as the last statement of Constant_Access: GNATprove is able to prove that this statement is unreachable, and we need that statement to confirm that the bottom of the function cannot be reached without returning a value.

The contract of Constant_Access is straightforward, as we are again reimplementing a concept that we already had in the specification (Constant_Access returns a pointer to the result of Model_Value). In the body of Constant_Access, we create a local variable C which observes the data-structure designated by M, just like we did for Contains. When the correct mapping is found, we return an access to the corresponding value in the data-structure. GNATprove will make sure that the structure returned is a part of the first parameter (here M) of the function.

Note that this function does not breach the ownership policy of SPARK as what is computed by the Constant_Access is an observer, so it will not be possible to assign it to a component inside another data-structure for example.

Now let's assume that we want to modify the value associated to a given key in the data-structure. We can provide a Replace_Element procedure which takes a map M, a key K, and an element V and replaces the value associated to K in M by V. We will write in its postcondition that the value associated to K in M after the call is V (this is not complete, as we do not say anything about other mappings, but for the sake of the explanation, let's stick to something simple).

   procedure Replace_Element (M : access Map; K : Positive; V : Element) with
Pre  => Model_Contains (M, K),
Post => Model_Contains (M, K) and then Model_Value (M, K) = V;

In its body, we want to loop, find the matching pair, and replace the element by the new value V. Here we cannot use an observer to search for the key K, as we want to modify the mapping afterward. Instead, we will use a local borrower. Just like an observer, a borrower takes the ownership of a part of an existing data-structure for the duration of its life-time, but it takes full ownership, in the sense that a borrow has the right to both read and modify the borrowed object. While the borrower is in scope, the borrowed object cannot be accessed directly (there is a provision for reading it in the RM that is used by the tool in particular cases, we will see later). At the end of the scope of the borrower, the ownership returns to the borrowed object. A borrower in SPARK is introduced by the declaration of an object of an anonymous access-to-variable type (note the use of "access Map" instead of "access constant Map"):

   procedure Replace_Element (M : access Map; K : Positive; V : Element) is
X : access Map := M;
begin
while X /= null loop
pragma Loop_Invariant (Model_Contains (X, K));
pragma Loop_Invariant
(Pledge (X, (if Model_Contains (X, K) then Model_Contains (M, K)
and then Model_Value (M, K) = Model_Value (X, K))));

if X.Key = K then
X.Value.all := V;
return;
end if;
X := X.Next;
end loop;
end Replace_Element;

The body of Replace_Element is similar to the body of Contains, except that we modify the borrower before returning from the procedure. However, we see that the loop invariant is more involved. Indeed, as we are modifying M using X, we need to know how the modification of X will affect M. Usually, GNATprove can track this information without help, but when loops are involved, this information needs to be supplied in the loop invariant. To describe how a structure and its borrower are affected, I have used a pledge. The notion of pledges was introduced by researchers from ETH Zurich to verify Rust programs (see the preprint of their work to be published at OOPSLA this year). Conceptually, a pledge is a property which will always hold between the borrowed object and the borrower, no matter the modifications that are made to the borrower. As pledges are not yet supported at a language level in SPARK, it is possible to mark (a part of) an assertion as a pledge by using an expression function which is annotated with a Pledge Annotate pragma:

   function Pledge (Borrower : access constant Map; Prop : Boolean) return Boolean is
(Prop)
with Ghost,
Annotate => (GNATProve, Pledge);
--  Pledge for maps

Note that the name of the function could be something other than "Pledge", but the annotation should use the string "Pledge". A pledge function is a ghost expression function which takes a borrower and a property and simply returns the property. When GNATprove encounters a call to such a function, it knows that the property given as a second parameter to the call must be handled as a pledge of the local borrower given as a first parameter. It will attempt to verify that, no matter the modification which may be done to Borrower, the property will still hold. Inside Prop, it is necessary to be able to mention the borrowed object. This is why there is provision for reading it in the SPARK RM, and in fact, it is the only case where the tool will allow a borrowed object to be mentioned.

In our example, the pledge of X states that, no matter how X will be modified afterward, if it happens that X has a mapping for K, then M will have the same mapping for K. This is true because we have not encountered K in the previous iterations of the loop, so if we find a mapping in X for K, it will be the first such mapping in M too. Note that a more precise pledge, like Pledge (X, Model_Contains (X, K)), would not be correct. Given a borrower aliasing the root of a subtree in a borrowed object, the pledge relationship expresses what necessarily holds for that subtree, independently of any modifications that may occur to it through the borrower. This is why we need to state the pledge here with an if-expression: "if the borrower X still contains the key K, then M necessarily contains the key K, and both agree on the associated value".

After the loop, M has been modified through X. GNATprove does not know anything about it but what can be deduced from the current value of X and information about the relation between M and X supplied by the loop-invariants that contain pledges. These invariants don't completely define the relation between X and M, but they give enough information to deduce the postcondition when a mapping for K is found in X. Since at the last iteration X.Key = K, GNATprove can deduce that Model_Contains (X, K) and Model_Value (X, K) = V holds after the loop. Using the pledges from the loop-invariants, it can infer that we also have Model_Contains (M, K) and Model_Value(M, K) = V.

Before we reach the end of this post, we will go one step further. We now have a way to replace an element with another one in the map. It can be used if we want to do a modification inside an element of the map, but it won't be efficient, as the element will need to be copied. We would like to provide a way to find the value associated to a given key in a map, and return an access to it so that it can be modified in-place. As for Constant_Reference, this can be done using a traversal function:

   function Pledge (Borrower : access constant Element; Prop : Boolean) return Boolean is
(Prop)
with Ghost,
Annotate => (GNATProve, Pledge);
--  Pledge for elements

function Reference (M : access Map; K : Positive) return not null access Element
with
Pre  => Model_Contains (M, K),
Post => Model_Value (M, K) = Reference'Result.all and then
Pledge (Reference'Result, Model_Contains (M, K) and then
Model_Value (M, K) = Reference'Result.all);

Reference returns a mutable access inside the data-structure M. We can see that, in its postcondition, I have added a pledge. Indeed, since the result of Reference is an access-to-variable object, a user of my function can use it to modify M. If I want the tool to be able to deduce anything about such a modification, I need to describe the link between the result of a call to the Reference function, and its first parameter. Here my pledge gives the same information as the postcondition of Replace_Element, that is, that the value designated by the result of the call will be the one mapped from K in M.

   function Reference (M : access Map; K : Positive) return not null access Element
is
X : access Map := M;
begin
while X /= null loop
pragma Loop_Invariant (Model_Contains (X, K));
pragma Loop_Invariant (Model_Value (X, K) = Model_Value (M, K));
pragma Loop_Invariant
(Pledge (X, (if Model_Contains (X, K) then Model_Contains (M, K)
and then Model_Value (M, K) = Model_Value (X, K))));

if X.Key = K then
return X.Value;
end if;
X := X.Next;
end loop;
raise Program_Error;
end Reference;


The body of Reference contains the same loop as Constant_Reference, except that I have added a loop invariant, similar to the one in  Replace_Element to supply a pledge to X.

The specification and verification of pointer-based data-structures is a challenge in deductive verification whether the "pointer" part is implemented as a machine pointer or as an array index (as an example, see our previous post about verifying insertion inside a red-black tree). In addition, SPARK has a strict ownership policy which will prevent completely the use of some (doubly-linked) data-structures, and complicate the writing of usual algorithms on others. However, I think I have demonstrated in this post that is still feasible to write and verify in SPARK some of these algorithms, with comparatively few user-supplied annotations.

]]>
Combining GNAT with LLVM https://blog.adacore.com/combining-gnat-with-llvm Tue, 01 Oct 2019 12:24:29 +0000 Arnaud Charlet https://blog.adacore.com/combining-gnat-with-llvm

Presenting the GNAT LLVM project

At AdaCore labs, we have been working for some time now on combining the GNAT Ada front-end with a different code generator than GCC.

The GNAT front-end is particularly well suited for this kind of exercise.  Indeed, we've already plugged many different code generators into GNAT in the past, including a Java byte code generator (the old "JGNAT" product for those who remember it), a .NET byte code generator derived from the Java one, a Why3 generator, used at the heart of the SPARK formal verification technology, a SCIL (Statically Checkable Intermediate Language) generator used in our advanced static analyzer CodePeer, and a C back-end used in GNAT CCG (Common Code Generator).

This time, we're looking at another general purpose code generator, called LLVM, in order to expand the outreach of Ada to the LLVM ecosystem (be it the compiler itself or other components such as static analysis tools).

This work-in-progress research project is called "GNAT LLVM" and is meant to show the feasibility of generating LLVM bytecode for Ada and to open the LLVM ecosystem to Ada, including tools such as KLEE, that we are also planning to work with and add Ada support for. Note that we are not planning on replacing any existing GNAT port based on GCC, so this project goes in addition rather than in replacement.

Technical Approach

We decided on a "pure" LLVM approach that's as easy to integrate and fit into the LLVM ecosystem as possible, using the existing LLVM API directly, while at the same time doing what we do best: write Ada code! So we are using the LLVM "C" API and generate automatically Ada bindings via the GCC -fdump-ada-spec switch and a bit of postprocessing done in a python script, that we can then call directly from Ada, which allows us to both easily traverse the GNAT tree and generate LLVM instructions, all in Ada.

By the way, if you know about the DragonEgg project then a natural question would be "why are you starting a GNAT LLVM project from scratch instead of building on top of DragonEgg?". If you want to know the answer, check the file README.dragonegg in the repository!

Next Steps

We have just published the GNAT LLVM tool sources licensed under GPLv3 on GitHub for hobbyists and researchers to experiment with it and give us their feedback.

If you are interested, give it a try, and let us know via Issues and Pull Requests on GitHub how it works for you, or by leaving comments at the bottom of this page! One exciting experiment would be to compile Ada code using the webassembly LLVM back-end for instance!

]]>

AdaCore’s fourth annual Make with Ada competition launched this week with over $8K in cash and prizes to be awarded for the most innovative embedded systems projects developed using Ada and/or SPARK. The contest runs from September 10, 2019, to January 31, 2020, and participants can register on the Hackster.io developer platform here. What’s new? Based on feedback from previous participants, we changed the competition evaluation criteria so projects will now be judged on: • Software quality - Does the software meet its requirements?; • Openness - Is the project open source?; and • “Buzz factor” - Does it have the wow effect to appeal to the software community? Further information about the judging criteria is available here We’ve also increased the amount of prizes to give more projects a chance to win: • One First Prize, in the amount of 2000 (two thousand) USD • Ten Finalist Prizes, in the amount of 600 (six hundred) USD each • One Student-only Prize (an Analog Discovery 2 Pro Bundle worth 299.99 USD) will go to the best-ranking student finalist. A project submitted by a student is eligible for both the Student-only Prize and the cash prizes. Award winners will be announced in March 2020 and project submissions will be evaluated by a judging panel consisting of Bill Wong, Senior Technology Editor at Electronic Design, and Fabien Chouteau, AdaCore software engineer, and author of the Make with Ada blog post series. Don’t forget that the new and enhanced GNAT Community 2019 is also available for download for use in your projects! ]]> First Ada Virtual Conference organized by and for the Ada community https://blog.adacore.com/ada-virtual-conferences Thu, 05 Sep 2019 13:34:25 +0000 Maxim Reznik https://blog.adacore.com/ada-virtual-conferences The Ada Community gathered recently around a new exciting initiative - an Ada Virtual Conference, to present Ada-related topics in a 100% remote environment. The first such conference took place on August, 10th 2019, around the topic of the new features in Ada 202x. The conference took the form of a video/audio chat based on the open source platform jitsi.org. No registration required, just access over a web browser or mobile application, with the possibility to participate anonymously. The presentation is just short of 25 minutes and is available on YouTube, Vimeo or via DropBox. Note that while the talk presents the current draft for Ada 202x, some features are still in discussion and may not make it to the standard, or with a different syntax or set of rules. That's in particular the case for all parallelism-related features and iterators (that is, up to slide 16 in the talk) which are being revisited by a group of people within AdaCore, in order to submit recommendations to the Ada Rapporteur Group next year. As the talk concludes, please contribute to the new Ada/SPARK RFCs website if you have ideas about the future of the language! Now the Ada Virtual Conference has a dedicated website, where anyone can vote for the topic of the next event. We invite you to participate! Image by Tomasz Mikołajczyk from Pixabay. ]]> Secure Use of Cryptographic Libraries: SPARK Binding for Libsodium https://blog.adacore.com/secure-use-of-cryptographic-libraries-spark-binding-for-libsodium Tue, 03 Sep 2019 11:54:00 +0000 Isabelle Vialard https://blog.adacore.com/secure-use-of-cryptographic-libraries-spark-binding-for-libsodium The challenge faced by cryptography APIs is to make building functional and secure programs easy for the user. Even with good documentation and examples, this remains a challenge, especially because incorrect use is still possible. I made bindings for two C cryptography libraries, TweetNaCl (pronounce Tweetsalt) and Libsodium, with the goal of making this binding easier to use than the original API by making it possible to detect automatically a large set of incorrect uses. In order to do this, I did two bindings for each library: a low-level binding in Ada, and a higher level one in SPARK, which I call the interface. I used Ada strong-typing characteristics and SPARK proofs to enforce a safe and functional use of the subprograms in the library. In this post I will explain the steps I took to create these bindings, and how to use them. Steps to create a binding I will use one program as example: Crypto_Box_Easy from Libsodium, a procedure which encrypts a message. At first I generated a binding using the Ada spec dump compiler: gcc -c -fdump-ada-spec -C ./sodium.h Which gives me that function declaration:  function crypto_box_easy (c : access unsigned_char; m : access unsigned_char; mlen : Extensions.unsigned_long_long; n : access unsigned_char; pk : access unsigned_char; sk : access unsigned_char) return int -- ./sodium/crypto_box.h:61 with Import => True, Convention => C, External_Name => "crypto_box_easy";  Then I modified this binding: First I changed the types used and I added in and out parameters. I removed the access parameter. Scalar parameters with out mode are passed using a temporary pointer on the C side, so it works even without explicit pointers. For unconstrained arrays like Block8 it is more complex. In Ada unconstrained arrays are represented by what is called fat pointers, that is to say a pointer to the bounds of the array and the pointer to the first element of the array. In C the expected parameter is a pointer to the first element of the array. So a simple binding like this one should not work. What saves the situation is this line which forces passing directly the pointer to the first element: with Convention => C; Thus we go from a low-level language where the memory is indexed by pointers to a typed language like Ada:  function crypto_box_easy (c : out Block8; m : in Block8; mlen : in Uint64; n : in out Block8; pk : in Block8; sk : in Block8) return int -- ./sodium/crypto_box.h:61 with Import => True, Convention => C, External_Name => "crypto_box_easy"; After this, I created an interface in SPARK that uses this binding: the goal is to make the same program as in the binding with some modifications. Some useless parameters will be deleted. For instance the C program often asks for an array and the length of this array (like m and mlen), which is useless since the length can be found with the attribute 'Length. Functions with out parameters must be changed into procedures to comply with SPARK rules. Finally new types can be created to take advantage of strong typing, as well as preconditions and postconditions.  procedure Crypto_Box_Easy (C : out Cipher_Text; M : in Plain_Text; N : in out Box_Nonce; PK : in Box_Public_Key; SK : in Box_Secret_Key) with Pre => C'Length = M'Length + Crypto_Box_MACBYTES and then Is_Signed (M) and then Never_Used_Yet (N);  How to use strong typing, and why Most of the parameters required by these programs are arrays. Some arrays for messages, others for key, etc. The type Block8 could be enough to represent them all: type Block8 is array (Index range <>) of uint8;  But then anyone could use a key as a message, or a message as a key. To avoid that, in Libsodium I derived new types from Block8. For instance, Box_Public_Key and Box_Secret_Key are the key types used by the Crypto_Box_* programs. The type for messages is Plain_Text, and the type for messages after encryption is Cipher_Text. Thus I take advantage of Ada's strong-typing characteristics in order to enforce the right use of the programs and their parameters. With TweetNaCl, I did things a bit differently: I created the different types directly in the binding, for the same result. Since TweetNaCl is a very small library, it was faster that way. In Libsodium I chose to let my first binding stay as close as possible to the generated one, and to focus on the interface where I use strong-typing and contracts (preconditions and postconditions). Preconditions and postconditions Preconditions and postconditions serve the same purpose as derived types: they enforce a specific use of the programs. There are two kinds of conditions: The first kind are mostly conditions on array's length. They are here to ensure the program will not fail. For instance: Pre => C'Length = M'Length + Crypto_Box_MACBYTES; This precondition says that the cipher text should be exactly Crypto_Box_MACBYTES bytes longer than the message that we want to encrypt. If this condition is not filled then execution will fail. Note that we can reference C'Length in the precondition, even though C is an out parameter, because the length attribute of an out parameter that is of an array type is available when the call is made, so we can reason about it in our precondition. The other kind of conditions is used to avoid an unsafe use of the programs. For instance, Crypto_Box uses a Nonce. A Nonce is a small array used as a complement to a key. In theory, to be safe, a key should be long, and used only for one message. However, it is costly to generate a new long key for each message. So we use a long key for every message with a Nonce which is different for each message, but easy to generate. Thus the encryption is safe, but only if we remember to use a different Nonce for each message. To ensure that, I wrote this precondition: Pre => Never_Used_Yet (N) Never_Used_Yet is a ghost function, it means a function that doesn't affect the program's behavior. type Box_Nonce is limited private; function Never_Used_Yet (N: Box_Nonce) return Boolean with Ghost; When GNATprove is used, it sees that procedure Randombytes (Box_Nonce: out N), the procedure that randomly generates the Nonce, has the postcondition Never_Used_Yet (N). So it deduces that when the Nonce is first generated, Never_Used_Yet (N) is true. Thus the first time N is used by Crypto_Box_Open, the precondition is valid. But if N is used a second time, GNATprove cannot prove Never_Used_Yet (N) is still true (because N as the parameter in out, so it could have been changed). That's why it cannot prove a program that calls Crypto_Box_Open twice with the same Nonce: There are ways around this condition: for instance one could copy a random generated Nonce many times to use it on different message. To avoid that, Box_Nonce is declared as limited private: it cannot be copied and GNATprove cannot prove a copied Nonce has the same Never_Used_Yet property as a generated one. Box_Nonce and Never_Used_Yet are declared in the private part of the package with SPARK_Mode Off so that GNATprove treats them as opaque entities: private pragma SPARK_Mode (Off); type Box_Nonce is new Block8 (1 .. Crypto_Box_NONCEBYTES); function Never_Used_Yet (N : Box_Nonce) return Boolean is (True);  Never_Used_Yet always returns true. It is a fake implementation, what matters is that it is hidden for the proof. It works at runtime because it is always used as a positive condition. As a program requirement it is always used as "Never_Used_Yet (N)" and never "not Never_Used_Yet (N)" so the conditions are always valid, and Never_Used_Yet doesn't affect the program's behavior, even if contracts are executed at runtime. Another example of preconditions made with a ghost function is the function Is_Signed (M : Plain_Text). When you want to send an encrypted message to someone, you want this person to be able to check if this message is from you, so no one will be able to steal your identity. To do this, you have to sign your message with Crypto_Sign_Easy, before encrypting it with Crypto_Box_Easy. Trying to skip the signing step leads to a proof error: How to use Libsodium_Binding The repository contains: • The project file libsodium.gpr • The library directory lib • The common directory which contains: ◦ The libsodium_binding package, a low-level binding in Ada made from the files generated using the Ada spec dump compiler. ◦ The libsodium_interface package, a higher level binding in SPARK which uses libsodium_binding. • include, a directory which contains the headers of libsodium. • libsodium_body, a directory which contains the bodies. • outside_src, a directory which contains the headers that were removed from include, to fix a problem of double definition. • thin_binding, a directory which contains the binding generated using the Ada spec dump compiler. • The test directory which contains tests for each group of functions. • A testsuite which verifies the same tests as the ones in the test directory. • The examples directory, with examples that use different groups of programs together. It also contains a program where a Nonce is used twice, and as expected it fails at proof stage. outside_src and thin_binding are not used for the binding, but I let them in the repository because it shows what I changed from the original libsodium sources and the Ada generated binding. This project is a library project so directory lib is the only thing necessary. How to use TweetNaCl_Binding The repository contains: • The project file tweetnacl.gpr • The common directory which contains: ◦ The tweetnacl_binding package, a low-level binding in Ada made from the files generated using the Ada spec dump compiler. ◦ The tweetnacl_interface package, a higher level binding in SPARK which uses tweetnacl_binding. ◦ tweetnacl.h and tweetnacl.c, the header and the body of the library ◦ randombytes.c, which holds randombytes, a program to generate arrays. • The test directory: test1 and test1b are functional examples of how to use tweetnacl main programs, the others are examples of what happens if you give an array with the wrong size, if you try to use the same nonce twice etc. They fail either at execution or at proof stage. To use this binding, you just have to include the common directory in the Sources of your project file. ]]> Proving a simple program doing I/O ... with SPARK https://blog.adacore.com/proving-a-simple-program-doing-io Tue, 09 Jul 2019 12:37:05 +0000 Joffrey Huguet https://blog.adacore.com/proving-a-simple-program-doing-io The functionality of many security-critical programs is directly related to Input/Output (I/O). This includes command-line utilities such as gzip, which might process untrusted data downloaded from the internet, but also any servers that are directly connected to the internet, such as webservers, DNS servers and so on. However, I/O has received little attention from the formal methods community, and SPARK also does not help the programmer much when it comes to I/O. SPARK has been used to debug functions from the standard library Ada.Text_IO, but this approach lacked support for error handling and didn't allow going up to the application level. As an example, take a look at the current specification of Ada.Text_IO.Put, which only recently has been annotated with some SPARK contracts: procedure Put (File : File_Type; Item : String) with Pre => Is_Open (File) and then Mode (File) /= In_File, Global => (In_Out => File_System); (We have suppressed the postcondition of this function, which talks about line and page length, a functionality of Ada.Text_IO which is not relevant to this blog post.) We can see that Put has a very light contract that protects against some errors, such as calling it on a file that has not been opened for writing, but not against the other many possible errors related to writing, such as a full disk or file permissions. If such an error occurs, Put raises an exception, but SPARK does not allow catching exceptions. So in the above form, even a proved SPARK program that uses Put may terminate with an unhandled exception. Moreover, Put does not specify what exactly is written. For example, one cannot prove in SPARK that two calls to Put with the same File argument write the concatenation of the two Item arguments to the file. This second problem can probably be solved using a stronger contract (though it would be difficult to do so in Ada.Text_IO, whose interface must respect the Ada standard), but together with the first point it means that, even if our program is annotated with a suitable contract and proved, we can only say something informal like “if the program doesn’t crash because of unexpected errors, it respects its contract”. In this blogpost, we propose to solve these issues as follows. We replace Put by a procedure Write which reports errors via its output Has_Written: procedure Write (Fd : int; Buf : Init_String; Num_Bytes : Size_T; Has_Written : out ssize_t); We now can annotate Write with a suitable postcondition that explains exactly what has been written to the file descriptor, and that a negative value of Has_Written signals an error. It is no accident that the new procedure looks like the POSIX system call write. In fact we decided to base this experiment on the POSIX API that consists of open, close, read and write, and simply added very thin SPARK wrappers around those system calls. The advantage of this approach is that system calls never crash (unless there are bugs in the kernel, of course) - they simply flag errors via their return value, or in our case, the out parameter Has_Written. This means that, assuming our program contains a Boolean variable Error that is updated accordingly after invoking the system calls, we can now formally write down the contract of our program in the form: if not Error then ... And if one thinks about it, this is really the best any program dealing with I/O can hope for, because some errors cannot be predicted or avoided by the programmer, such as trying to write to a full disk or trying to open a file that does not exist. We could further refine the above condition to distinguish the behavior of the program depending on the error that was encountered, but we did not do this in the work described here. The other advantage of using a POSIX-like API is that porting existing programs from C to this API would be simpler. We validated this approach by writing a SPARK clone of the “cat” utility. We were able to prove that cat copies the content of the files in its argument to stdout, unless errors were encountered. The project is available following this link. How to represent the file system? The main interest of the library is to be able to write properties about content, which means that we have to represent this content through something (global variables, state abstraction…). We decided to use maps, that link a file descriptor to an unbounded string representing the content of the corresponding file. Formal_Hashed_Maps were convenient to use because they come with many functions that compare two different maps (e.g Keys_Included_Except, or Elements_Equal_Except). In Ada, even Unbounded_String (from Ada.Strings.Unbounded) are somehow bounded: the maximum length of Unbounded_String is Natural’Last, a length that could be exceeded by certain files. We had to create another library, that relies on Ada.Strings.Unbounded, but would accept appending two unbounded strings whose lengths are maximal. The choice we made was to drop any character that would make the length overflow, e.g. the append function has the following contract: function "&" (L, R : My_Unbounded_String) return My_Unbounded_String with Global => null, Contract_Cases => (Length (L) = Natural'Last => To_String ("&"'Result) = To_String (L), -- When L already has maximal length Length (L) < Natural'Last and then Length (R) <= Natural'Last - Length (L) => To_String ("&"'Result) = To_String (L) & To_String (R), -- When R can be appended entirely others => To_String ("&"'Result) = To_String (L) & To_String (R) (1 .. Natural'Last - Length (L))); -- When R cannot be appended entirely Also, what you can see here is that all properties of unbounded strings are expressed with the conversion to String type: this allows to use the existing axiomatization of strings (through arrays) instead of redefining one. Another design choice was made here: what’s the content of a file we opened in read mode? In stdio.ads, we considered that the content of this file is strictly what we read from it. Because there is no way to know the content of the file, we decided to implement cat as a procedure that would “write whatever it reads”. Any given file could be modified while reading, or, for the case of stdin, parent process may also read part of the data. The I/O library As said before, the library consists of thin bindings to system calls. Interfacing scalar types (int, size_t, ssize_t) was easy. We also wanted to model more precisely instantiation of buffers. Indeed, when calling Read procedure, the buffer might not be initialized entirely after the call; it is also possible to call Write on a partially initialized String, to copy the first n values that are initialized. The current version of SPARK requires that arrays (and thus Strings) are fully initialized. More details are available in the SPARK User's Guide. A new proposed evolution of the language (see here) is already prototyped in SPARK and allows manipulating safely partially initialized data. In the prototype, we declare a new type, Init_Char, that will have the Init_By_Proof annotation. subtype Init_Char is Character; pragma Annotate(GNATprove, Init_By_Proof, Init_Char); type Init_String is array (Positive range <>) of Init_Char;  With this type declaration, it is now possible to use the attribute ‘Valid_Scalars on Init_Char variables or slices of Init_String variables in Ghost code to specify which scalars have been initialized The next part was writing the bindings to C functions. An example, for Read, is the following: function C_Read (Fd : int; Buf : System.Address; Size : size_t; Offset : off_t) return ssize_t; pragma Import (C, C_Read, "read"); procedure Read (Fd : int; Buf : out Init_String; Has_Read : out ssize_t) is begin Has_Read := C_Read (Fd, Buf'Address, Buf'Length, 0); end Read;  Other than hiding the use of addresses from SPARK, this part was not very difficult. The final part was adding the contracts to our procedures. Firstly, there are no preconditions. System calls may return errors, but they will accept any parameter in input. The only precondition we have in the library is a precondition on Write, that states that the characters that we want to write in the file are initialized. Secondly, every postcondition is a case expression, where we give properties for each possible return value, e.g: procedure Open (File : char_array; Flags : int; Fd : out int) with Global => (In_Out => (FD_Table, Errors.Error_State, Contents)), Post => (case Fd is when -1 => Contents'Old = Contents, when 0 .. int (OPEN_MAX - 1) => Length (Contents'Old) + 1 = Length (Contents) and then Contains (Contents, Fd) and then Length (Element (Contents, Fd)) = 0 and then not Contains (Contents'Old, Fd) and then Model (Contents'Old) <= Model (Contents) and then M.Keys_Included_Except (Model (Contents), Model (Contents'Old), Fd), when others => False); The return value of Open will be either -1, which corresponds to an error, or a natural value (in my case, OPEN_MAX is equal to 1023, 1024 being the maximum number of files that can be open at the same time on my machine). If an error occured, the Contents map is the same as before. If an appropriate file descriptor is returned, the postcondition states that the new Contents map has the same elements as before, plus a new empty unbounded string associated with the file descriptor. With regard to the functional model, these contracts are complete. The Cat program The cat program is split in two different parts: the main program that opens/closes file(s) in its argument, and calls Copy_To_Stdout. This procedure will read from the input file and write to stdout what it read. Since errors in I/O can happen, we propagate them using a status flag from nested subprograms to the main program and handle them there. This point is the first difference with Ada.Text_IO. The other difference is the presence of postconditions about data, for example in postcondition of Copy_To_Stdout: procedure Copy_To_Stdout (Input : int; Err : out Int) with Post => (if Err = 0 then Element (Contents, Stdout) = Element (Contents’Old, Stdout) & Element (Contents, Input));  This postcondition is the only functional contract we have for cat, this is why it is so important. It states that if no error occurred, the content of stdout is equal to its value before calling Copy_To_Stdout appended to the content we read from the input. If we wanted to write a more precise contract, the main difficulty would be to handle the cases where an error occured. For example, if we call cat on three different files, and one error occurs when copying the second file to stdout, we have no contract about the content of Stdout, and everything becomes unprovable. Adding contracts for these cases would require work on slices and sliding and to do even more case-splitting, which would add more difficulty for the provers. Type definitions and helper subprograms to define the library take about 200 lines of code. The library itself has around 100 lines of contracts. The cat program has 100 lines of implementation and 1200 lines of Ghost code in order to prove everything. Around 1000 verification conditions for the entire project (I/O library + cat + lemmas) are discharged by the solvers to prove everything with auto-active proof. Conclusion Cat looks like a simple program, and it is. But being able to prove correctness of cat shows that we are able to reason about contents and copying of data (here, between file descriptors), something which is necessary and largely identical for more complex applications like network drivers or servers that listen on sockets. We will be looking at extending our approach to code manipulating (network) sockets. ]]> Using Ada for a Spanish Satellite Project https://blog.adacore.com/using-ada-for-a-spanish-satellite-project Tue, 18 Jun 2019 13:52:00 +0000 Juan Zamorano https://blog.adacore.com/using-ada-for-a-spanish-satellite-project I am an Associate Professor at Polytechnic University of Madrid’s (Universidad Politécnica de Madrid / UPM) in the Department of Architecture and Technology of Computer Systems. For the past several years I have been directing a team of colleagues and students in the development of a UPMSat-2 microsatellite. The project originally started in 2013 as a follow-to the UPM-SAT 1, launched by an Ariane-4 in 1995. The UPMSat-2 weighs 50kg, and its geometric envelope is a parallelepiped with a base measuring 0.5m x 0.5m and a height of 0.6m. The microsatellite is scheduled to be launched September 9, 2019 on a Vega launcher, and is expected to be operational for two years. The primary goals of the project were: • to improve the knowledge of the project participants, both professors and students; • to demonstrate UPM’s capabilities in space technology; • to design, develop, integrate, test, launch and operate a microsatellite in orbit from within a university environment; and • to develop a qualified space platform that can be used for general purpose missions aimed at educational, scientific and technological demonstration applications. The project encompasses development of the software together with the platform, thermal control, attitude control, and other elements. In 2014, we selected AdaCore’s GNAT cross-development environment for the UPMSat-2 microsatellite project’s real-time on-board and ground control software. While Java is the primary language used to teach programming at UPM, Ada was chosen as the main programming language for our project because we considered it the most appropriate to develop high-integrity software. In total, the on-board software consists of more than 100 Ada packages, comprising over 38K lines of code. For the altitude control subsystem, we used C code that was generated automatically from Simulink® models (there are 10 source files in C, with a total of about 1,600 lines of code). We also used database and XML interfaces to support the development of the ground control software. Ada is Easy to Learn Since most of our students are last-year or graduate students, they generally have programming experience. However, they did not have experience with embedded systems or real-time programming, and none of them had any previous experience with GNAT, SPARK or Ada. To teach Ada to our students, we provided them with John Barnes’ Programming in Ada 2012 textbook and spent a fair amount of time with it in the laboratory. The difficult part was not in understanding and using Ada, but rather in understanding the software issues and programming style associated with concurrency, exceptions, real-time scheduling, and large system design. Fortunately, Ada’s high-level concurrency model, simple exception facility, real-time support, and its many features for “programming in the large” (packages, data abstraction, child libraries, generics, etc.) helped to address these difficulties. We also used a GNAT feature that saved us a lot of tedious coding time - the Scalar_Storage_Order attribute. The UPMSat-2’s on-board computer is big-endian and the ground computer is little-endian. Therefore, we had to decode and encode every telemetry and telecommand message to deal with the endianness mismatch. I learned about the Scalar_Storage_Order feature at an AdaCore Tech Day, and it works really well, even for 12-bit packet types. Although it would have been nice to have some additional tool support for things like database and XML interfacing, we found the GNAT environment very intuitive and especially appreciated the GPS IDE; it’s a great tool for developing software. Why Ada? I have been in love with Ada for a long time. I learned programming with Pascal and concurrent Pascal (as well as Fortran and COBOL); I find it frustrating that, at many academic institutions, modular, strongly-typed, concurrent languages such as Ada have sometimes been replaced by others that have much weaker support for good programming practices. While I do not teach programming at UPM, my research group tries to use Ada whenever possible, because we consider it the most appropriate programming language for illustrating the concepts of real-time and embedded systems. I have to say that most of my students have also fallen in love with Ada. Our graduate students in-particular appreciate the value and reliability that Ada brings to their final projects. https://www.adacore.com/press/spanish-satellite-project ]]> RFCs for Ada and SPARK evolution now on GitHub https://blog.adacore.com/rfcs-for-ada-and-spark-evolution-now-on-github Tue, 11 Jun 2019 12:46:54 +0000 Yannick Moy https://blog.adacore.com/rfcs-for-ada-and-spark-evolution-now-on-github Ever wished that Ada was more this and less that? Or that SPARK had such-and-such feature for specifying your programs? Then you're not alone. The Ada-Comment mailing list is one venue for Ada language discussions, but many of us at AdaCore have felt the need for a more open discussion and prototyping of what goes in the Ada and SPARK languages. That's the main reason why we've set up a platform to collect, discuss and process language evolution proposals for Ada and SPARK. The platform is hosted on GitHub, and uses GitHub built-in mechanisms to allow people to propose fixes or evolutions for Ada & SPARK,or give feedback on proposed evolutions. For SPARK, the collaboration between Altran and AdaCore allowed us to completely redesign the language as a large subset of Ada, including now object orientation (added in 2015), concurrency (added in 2016) and even pointers (now available!), but we're reaching the point where catching up with Ada cannot lead us much further, and we need a broader involvement from users to inform our strategic decisions. Regarding Ada, the language evolution has always been a collective effort by an international committee, but here too we feel that more user involvement would be beneficial to drive future evolution, including for the upcoming Ada 202X version. Note that there is no guarantee that changes discussed and eventually prototyped & implemented will ever make it into the Ada standard, even though AdaCore will do its best to collaborate with the Ada Rapporteur Group (ARG). You will see that we've started using the RFC process internally. That's just the beginning. We plan to use this platform much more broadly within AdaCore and the Ada community to evolve Ada and SPARK in the future. Please join us in that collective effort if you are interested! ]]> Using Pointers in SPARK https://blog.adacore.com/using-pointers-in-spark Thu, 06 Jun 2019 12:22:00 +0000 Claire Dross https://blog.adacore.com/using-pointers-in-spark I joined the SPARK team during the big revamp leading to the SPARK 2014 version of the proof technology. Our aim was to include in SPARK all features of Ada 2012 that did not specifically cause problems for the formal verification process. Since that time, I have always noticed the same introduction to the SPARK 2014 language: a subset of Ada, excluding features not easily amenable to formal analysis. Following this sentence was a list of (more notable) excluded features. Over the years, this list of features has started to shrink, as (restricted) support for object oriented programming, tasking and others were added to the language. Up until now, the most notable feature that was still missing, to my sense, was pointer support (or support for access types as they are called in Ada). I always thought that this was a feature we were never going to include. Indeed, absence of aliasing is a key assumption of the SPARK analysis, and removing it would induce so much additional annotation burden for users, that it would make the tool hardly usable. This was what I believed, and this was what we kept explaining to users, who were regretting the absence of an often used feature of the language. I think it was work on the ParaSail language, and the emergence of the Rust language, that first made us look again into supporting this feature. Pointer ownership models did not appear explicitly within Rust, but Rust made the restrictions associated with them look tractable, and maybe even desirable from a safety point of view. So what is pointer ownership? Basically, the idea is that an object designated by a pointer always has a single owner, which retains the right to either modify it, or (exclusive or) share it with others in a read-only way. Said otherwise, we always have either several copies of the pointer which allow only reading, or only a single copy of the pointer that allows modification. So we have pointers, but in a way that allows us to ignore potential aliases… What a perfect fit for SPARK! So we began to look into how ownership rules were enforced in Rust and ParaSail, and how we could adapt some of them for Ada without introducing too many special cases and new annotations. In this post, I will show you what we came up with. Don’t hesitate to comment and tell us what you like / don’t like about this feature. The main idea used to enforce single ownership for pointers is the move semantics of assignments. When a pointer is copied through an assignment statement, the ownership of the pointer is transferred to the left hand side of the assignment. As a result, the right hand side loses the ownership of the object, and therefore loses the right to access it, both for writing and reading. On the example below, the assignment from X to Y causes X to lose ownership on the value it references. As a result, the last assertion, which reads the value of X, is illegal in SPARK, leading to an error message from GNATprove: procedure Test is type Int_Ptr is access Integer; X : Int_Ptr := new Integer'(10); Y : Int_Ptr; -- Y is null by default begin Y := X; -- ownership of X is transferred to Y pragma Assert (Y.all = 10); -- Y can be accessed Y.all := 11; -- both for reading and writing pragma Assert (X.all = 11); -- but X cannot, or we would have an alias end Test; test.adb:9:20: insufficient permission on dereference from "X" test.adb:9:20: object was moved at line 6 In this example, we can see the point of these ownership rules. To correctly reason about the semantics of a program, SPARK needs to know, when a change is made, what are the objects that are potentially impacted. Because it assumes that there can be no aliasing (at least no aliasing of mutable data), the tool can easily determine what are the parts of the environment that are updated by a statement, be it a simple assignment, or for example a procedure call. If we were to break this assumption, we would need to either assume the worst (that all references can be aliases of each other) or require the user to explicitly annotate subprograms to describe which references can be aliased and which cannot. In our example, SPARK can deduce that an assignment to Y cannot impact X. This is only correct because of ownership rules that prevent us from accessing the value of X after the update of Y. Note that a variable which has been moved is not necessarily lost for the rest of the program. Indeed, it is possible to assign it again, restoring ownership. For example, here is a piece of code that swaps the pointers X and Y: declare Tmp : Int_Ptr := X; -- ownership of X is moved to Tmp -- X cannot be accessed. begin X := Y; -- ownership of Y is moved to X -- Y cannot be accessed -- X is unrestricted. Y := Tmp; -- ownership of Tmp is moved to Y -- Tmp cannot be accessed -- Y is unrestricted. end; This code is accepted by the SPARK tool. Intuitively, we can see that writing at top-level into X after it has been moved is OK, since it will not modify the actual owner of the moved value (here Tmp). However, writing in X.all is forbidden, as it would affect Tmp (don’t hesitate to look at the SPARK Reference Manual if you are interested in the formal rules of the move semantics). For example, the following variant is rejected: declare Tmp : Int_Ptr := X; -- ownership of X is moved to Tmp -- X cannot be accessed. begin X.all := Y.all; insufficient permission on dereference from "X" object was moved at line 2 Moving is not the only way to transfer ownership. It is also possible to borrow the ownership of (a part of) an object for a period of time. When the borrower disappears, the borrowed object regains the ownership, and is accessible again. It is what happens for example for mutable parameters of a subprogram when the subprogram is called. The ownership of the actual parameter is transferred to the formal parameter for the duration of the call, and should be returned when the subprogram terminates. In particular, this disallows procedures that move some of their parameters away, as in the following example: type Int_Ptr_Holder is record Content : Int_Ptr; end record; procedure Move (X : in out Int_Ptr_Holder; Y : in out Int_Ptr_Holder) is begin X := Y; -- ownership of Y.Content is moved to X.Content end Move; insufficient permission for "Y" when returning from "Move" object was moved at line 3 Note that I used a record type for the type of the parameters. Indeed, SPARK RM has a special wording for in out parameters of an access type, stating that they are not borrowed but moved on entry and on exit of the subprogram. This allows us to move in out access parameters, which otherwise would be forbidden, as borrowed top-level access objects cannot be moved. The SPARK RM also allows declaring local borrowers in a nested scope by using an anonymous access type: declare Y : access Integer := X; -- Y borrows the ownership of X -- for the duration of the declare block begin pragma Assert (Y.all = 10); -- Y can be accessed Y.all := 11; -- both for reading and writing end; pragma Assert (X.all = 11); -- The ownership of X is restored, -- it can be accessed again  But this is not supported yet by the proof tool, as it raises the complex issue of tracking modifications of X that were done through Y during its lifetime: local borrower of an access object is not yet supported It is also possible to share a single reference between several readers. This mechanism is called observing. When a variable is observed, both the observed object and the observer retain the right to read the object, but none can modify it. As for borrowing, when the observer disappears, the observed object regains the permissions it had before (read-write or read-only). Here is an example. We have a list L, defined as a recursive pointer-based data structure in the usual way. We then observe its tail by introducing a local observer N using an anonymous access to constant type. We then do it again to observe the tail of N: declare N : access constant List := L.Next; -- observe part of L begin declare M : access constant List := N.Next; -- observe again part of N begin pragma Assert (M.Val = 3); -- M can be read pragma Assert (N.Val = 2); -- but we can still read N pragma Assert (L.Val = 1); -- and even L end; end; L.Next := null; -- all observers are out of scope, we can modify L We can see that the three variables retain the right to read their content. But it is OK as none of them is allowed to update it. When no more observers exist, it is again possible to modify L. In addition to single ownership, SPARK restricts the use of access types in several ways. The most notable one is that SPARK does not allow general access types. The reason is that we did not want to deal with accesses to variables defined on the stack and accessibility levels. Also, access types cannot be stored in subcomponents of tagged types, to avoid having access types hidden in record extensions. To get convinced that the rules enforced by SPARK still allow common use cases, I think the best is to look at an example. A common use case for pointers in Ada is to store indefinite types inside data-structures. Indefinite types are types whose subtype is not known statically. It is the case for example for unconstrained arrays. Since the size of an indefinite type is not known statically, it is not possible to store it inside a data-structure, such as another array, or a record. For example, as strings are arrays, it is not possible to create an array that can hold strings of arbitrary length in Ada. The usual work-around consists in adding an indirection via the use of pointers, storing pointers to indefinite elements inside the data structure. Here is an example of how this can now be done in SPARK, for a minimal implementation of a dictionary. A simple vision of a dictionary is an array of strings. Since strings are indefinite, I need to define an access type to be allowed to store them inside an array: type Word is not null access String; type Dictionary is array (Positive range <>) of Word; We can then search for a word in a dictionary. The function below is successfully verified in SPARK. In particular, SPARK is able to verify that no null pointer dereference may happen, due to Word being an access type with null exclusion: function Search (S : String; D : Dictionary) return Natural with Post => (Search'Result = 0 and then (for all I in D'Range => D (I).all /= S)) or else (Search'Result in D'Range and then D (Search'Result).all = S) is begin for I in D'Range loop pragma Loop_Invariant (for all K in D'First .. I - 1 => D (K).all /= S); if D (I).all = S then return I; end if; end loop; return 0; end Search; Now imagine that I want to modify one of the words stored in my dictionary. The words may not have the same length, so I need to replace the pointer in the array. For example: My_Dictionary (1) := new String'("foo"); pragma Assert (My_Dictionary (1).all = "foo"); But this is not great, as now I have a memory leak. Indeed, the value previously stored in My_Dictionary is no longer accessible and it has not been deallocated. The SPARK tool does not currently complain about this problem, even though the SPARK definition says it should (it has not been implemented yet). But let’s try to correct our code nevertheless by storing the value previously in dictionary in a temporary and deallocating it afterward. First I need a deallocation function. In Ada, they can be obtained by instantiating the generic procedure Ada.Unchecked_Deallocation with the appropriate types (note that as access objects are set to null after deallocation, I had to introduce a base type for Word without the null exclusion constraint): type Word_Base is access String; subtype Word is not null Word_Base; procedure Free is new Ada.Unchecked_Deallocation (String, Word_Base); Then, I can try to do the replacement: declare Temp : Word_Base := My_Dictionary (1); begin My_Dictionary (1) := new String'("foo"); Free (Temp); pragma Assert (My_Dictionary (1).all = "foo"); end; Unfortunately this does not work, the SPARK tool complains with: test.adb:36:37: insufficient permission on dereference from "My_Dictionary" test.adb:36:37: object was moved at line 3 Where line 31 is the line where Temp is defined and 36 is the assertion. So, what is happening? In fact, this is due to the way checking of single ownership is done in SPARK. As the analysis used for this verification is not value dependent, when a cell of an array is moved, the tool is never able to determine whether or not ownership to an array cell has been regained. As a result, if an element of an array is moved away, the array will never become readable again unless it is assigned as a whole. Better to avoid moving elements of an array in these conditions, right? So what can we do? If we cannot move, what about borrowing... Let us try with an auxiliary Swap procedure: procedure Swap (X, Y : in out Word_Base) with Pre => X /= null and Y /= null, Post => X /= null and Y /= null and X.all = Y.all'Old and Y.all = X.all'Old is Temp : Word_Base := X; begin X := Y; Y := Temp; end Swap; declare Temp : Word_Base := new String'("foo"); begin Swap (My_Dictionary (1), Temp); Free (Temp); pragma Assert (My_Dictionary (1).all = "foo"); end; Now everything is fine. The ownership on My_Dictionary (1) is temporarily transferred to the X formal parameter of Swap for the duration of the call, and it is restored at the end. Now the SPARK tool can ensure that My_Dictionary indeed has the full ownership of its content after the call, and the read inside the assertion succeeds. This small example is also verified by the SPARK tool. I hope this post gave you a taste of what it would be like to program using pointers in SPARK. If you now feel like using them, a preview is available in the community 2019 edition of GNAT+SPARK. Don’t hesitate to come back to us with your findings, either on GitHub or by email. ]]> GNAT Community 2019 is here! https://blog.adacore.com/gnat-community-2019-is-here Wed, 05 Jun 2019 12:56:00 +0000 Nicolas Setton https://blog.adacore.com/gnat-community-2019-is-here We are pleased to announce that GNAT Community 2019 has been released! See https://www.adacore.com/download. This release is supported on the same platforms as last year: • Windows, Linux, and Mac 64-bit native • RISC-V hosted on Linux • ARM 32 bits hosted on 64-bit Linux, Mac, and Windows GNAT Community now includes a number of fixes and enhancements, most notably: • The installer for Windows and Linux now contains pre-built binary distributions of Libadalang, a very powerful language tooling library for Ada and SPARK. Check out the README for some additional platform-specific notes. We hope you enjoy using SPARK and Ada! ]]> Bringing Ada To MultiZone https://blog.adacore.com/bringing-ada-to-multizone Wed, 29 May 2019 21:30:00 +0000 Boran Car https://blog.adacore.com/bringing-ada-to-multizone Introduction C is the dominant language of the embedded world, almost to the point of exclusivity. Due to its age, and its goal of being a “portable assembler”, it deliberately lacks type-safety that languages like Ada provide. The lack of type-safety in C is one of the reasons for the commonness of embedded device exploits. Proposed solutions are partitioning the application into smaller intercommunicating blocks, designed with the principle of least privilege in mind; and rewriting the application in a type-safe language. We believe that both approaches are complementary and want to show you how to combine separation and isolation provided by MultiZone together with iteratively rewriting parts in Ada. We will take the MultiZone SDK demo and rewrite one of the zones in Ada. The full demo simulates an industrial application with a robotic arm. It runs on the Arty A7-35T board and interfaces with the PC and a robotic arm (OWI-535 Robotic Arm) via an SPI to USB converter. More details are available from the MultiZone Security SDK for Ada manual (https://github.com/hex-five/multizone-ada/blob/master/manual.pdf). We will just be focusing on the porting process here. MultiZone Security MultiZone(TM) Security is the first Trusted Execution Environment for RISC-V - it enables development of a simple, policy-based security environment for RISC-V that supports rich operating systems through to bare metal code. It is a culmination of the embedded security best practices developed over the last decade and now applied to RISC-V processors. Instead of splitting into the secure and non-secure domain, MultiZoneTM Security provides policy-based hardware-enforced separation for an unlimited number of security domains, with full control over data, code and peripherals. MultiZoneTM Security consists of the following components: • MultiZone(TM) nanoKernel - lightweight, formally verifiable, bare metal kernel providing policy-driven hardware-enforced separation of ram, rom, i/o and interrupts. • InterZone(TM) Messenger - communications infrastructure to exchange secure messages across zones on a no- shared memory basis. • MultiZone(TM) Configurator - combines fully linked zone executables with policies and kernel to generate the signed firmware image. • MultiZone(TM) Signed Boot - 2-stage signed boot loader to verify integrity and authenticity of the firmware image (sha-256 / ECC) Contrary to traditional solutions, MultiZone(TM) Security requires no additional hardware, dedicated cores or clunky programming models. Open source libraries, third party binaries and legacy code can be configured in minutes to achieve unprecedented levels of safety and security. See https://hex-five.com/ for more details or check out the MultiZone SDK repository on GitHub - https://github.com/hex-five/multizone-sdk. Ada on MultiZone We port zone 3, the zone controlling the robotic arm, to Ada. The zone communicates with other zones via MultiZone APIs and with the robotic arm by bitbanging GPIO pins. New Runtime MultiZone zones differ from a bare metal applications as access to resources is restricted – a zone has only a portion of the RAM and FLASH and can only access some of the peripherals. Looking at our configuration (https://github.com/hex-five/multizone-ada/blob/master/bsp/X300/multizone.cfg), zone3 has the following access privileges: Zone = 3 # base = 0x20430000; size = 64K; rwx = rx # FLASH base = 0x80003000; size = 4K; rwx = rw # RAM base = 0x0200BFF8; size = 0x8; rwx = r # RTC base = 0x10012000; size = 0x100; rwx = rw # GPIO In the Ada world, this translates to having a separate runtime that we need to create. Luckily, AdaCore has released sources to their existing runtimes on GitHub - https://github.com/adacore/bb-runtimes and they have also included a how-to for creating new runtimes - https://github.com/AdaCore/bb-runtimes/tree/community-2018/doc/porting_runtime_for_cortex_m. Here’s how we create our customized HiFive1 runtime: ./build_rts.py --bsps-only --output=build --prefix=lib/gnat hifive1 This creates sources for building a runtime using a mix of sources from bb-runtimes and from the compiler itself thanks to the --bsps-only flag. Without this switch, we would need the original GNAT repository, which is not publicly available. Notice we don’t use --link, so our runtime sources are a proper copy and can be checked into a new git repo - https://github.com/hex-five/multizone-ada/tree/master/bsp/X300/runtime. Our runtime needs to be compiled and installed before it can be used, and we do that automatically as part of the Makefile for zone3 - https://github.com/hex-five/multizone-ada/blob/master/zone3/Makefile: BSP_BASE := ../bsp PLATFORM_DIR :=$(BSP_BASE)/$(BOARD) RUNTIME_DIR :=$(PLATFORM_DIR)/runtime
GPRBUILD := $(abspath$(GNAT))/bin/gprbuild
GPRINSTALL := $(abspath$(GNAT))/bin/gprinstall

.PHONY: all
all:
$(GPRBUILD) -p -P$(RUNTIME_DIR)/zfp_hifive1.gpr
$(GPRINSTALL) -f -p -P$(RUNTIME_DIR)/zfp_hifive1.gpr --prefix=$(RUNTIME_DIR)$(AR) cr $(RUNTIME_DIR)/lib/gnat/zfp-hifive1/adalib/libgnat.a$(RUNTIME_DIR)/hifive1/zfp/obj/*.o
$(GPRBUILD) -f -p -P zone3.gpr$(OBJCOPY) -O ihex obj/main zone3.hex --gap-fill 0x00

Note that we use a combination of GPRbuild and Make to minimize code differences between the two repositories as much as possible. GPRbuild is limited to zone3 only.

Board support

We use the Ada Drivers Library on GitHub (https://github.com/AdaCore/Ada_Drivers_Library) as a starting point as it provides examples for a variety of boards and architectures. Our target is the X300, itself a modified HiFive1/FE310/FE300. The differences between FE310/FE300 and X300 are detailed on the multizone-fpga GitHub repository (https://github.com/hex-five/multizone-fpga).

We need to change the drivers for our application slightly - we want LD0 for indicating the status of the robotic arm, red blink when disconnected and green when connected:

with FE310.Device; use FE310.Device;
with SiFive.GPIO; use SiFive.GPIO;

package Board.LEDs is

subtype User_LED is GPIO_Point;

Red_LED   : User_LED renames P01;
Green_LED : User_LED renames P02;
Blue_LED  : User_LED renames P03;

procedure Initialize;
-- MUST be called prior to any use of the LEDs

procedure Turn_On (This : in out User_LED) renames SiFive.GPIO.Set;
procedure Turn_Off (This : in out User_LED) renames SiFive.GPIO.Clear;
procedure Toggle (This : in out User_LED) renames SiFive.GPIO.Toggle;

procedure All_LEDs_Off with Inline;
procedure All_LEDs_On with Inline;
end Board.LEDs;

Having the package named Board allows us to modify the target board at compile-time by just providing the folder containing the Board package. This goes in line with what multizone-sdk does.

Code

MultiZone support

MultiZone nanoKernel offers trap and emulate so existing applications can be provided to MultiZone directly unmodified and will work as expected. Access to RAM, FLASH and peripherals needs to be allowed in the configuration file, though. MultiZone does provide functionality for increased performance, better power usage and interzone communication via the API in Libhexfive. Here we create an Ada wrapper around libhexfive32.a, providing the MultiZone specific calls:

package MultiZone is

procedure ECALL_YIELD; -- libhexfive.h:8
pragma Import (C, ECALL_YIELD, "ECALL_YIELD");

procedure ECALL_WFI; -- libhexfive.h:9
pragma Import (C, ECALL_WFI, "ECALL_WFI");
...
end MultiZone;

A typical MultiZone optimized application will yield whenever it doesn’t have anything to do, to save on processing time and power:

with MultiZone; use MultiZone;

procedure Main is
begin
-- Application initialization
loop
-- Application code
ECALL_YIELD;
end loop;
end Main;

If a zone wants to communicate with other zones, such as receiving commands and sending back replies, it needs to use the ECALL_SEND/ECALL_RECV. These send/receive a chunk of 16 bytes and return a status whether the send/receive was successful. The prototypes are a bit special, as they take a void * parameter, which translates to System.Address:

function ECALL_SEND_C (arg1 : int; arg2 : System.Address) return int; -- libhexfive.h:11
pragma Import (C, ECALL_SEND_C, "ECALL_SEND");
function ECALL_RECV_C (arg1 : int; arg2 : System.Address) return int; -- libhexfive.h:12
pragma Import (C, ECALL_RECV_C, "ECALL_RECV");

We wrap these to provide a more Ada idiomatic alternative:

type Word is new Unsigned_32;
type Message is array (0 .. 3) of aliased Word;
pragma Pack (Message);
subtype Zone is int range 1 .. int'Last;

function Ecall_Send (to : Zone; msg : Message) return Boolean;
function Ecall_Recv (from : Zone; msg : out Message) return Boolean;

We hide the System.Address usage and provide a safer subtype for the source/destination zone, since zone cannot be negative or 0.

function Ecall_Send (to : Zone; msg : Message) return Boolean is
begin
return ECALL_SEND_C (to, msg'Address) = 1;
end Ecall_Send;

function Ecall_Recv (from : Zone; msg : out Message) return Boolean is
begin
return ECALL_RECV_C (from, msg'Address) = 1;
end Ecall_Recv;

With all the primitives in place, we can make a simple MultiZone optimized application that can respond to a ping from another zone:

with HAL;       use HAL;
with MultiZone; use MultiZone;

procedure Main is
begin
-- Application initialization
loop
-- Application code
declare
msg : Message;
Status : Boolean := Ecall_Recv (1, msg);
begin
if Status then
if msg(0) = Character'Pos('p') and
msg(1) = Character'Pos('i') and
msg(2) = Character'Pos('n') and
msg(3) = Character'Pos('g') then
Status := Ecall_Send (1, msg);
end if;
end if;
end;

ECALL_YIELD;
end loop;
end Main;

Keeping some legacy (OWI Robot)

We keep the SPI functionality and the Owi Task as C files and just create Ada bindings for them. spi_c.c implements the SPI protocol by bit-banging GPIO pins:

package Spi is

procedure spi_init; -- ./spi.h:8
pragma Import (C, spi_init, "spi_init");

function spi_rw (cmd : System.Address) return UInt32; -- ./spi.h:9
pragma Import (C, spi_rw, "spi_rw");

end Spi;

The OwiTask (owi_task.c) is a state machine containing different robot sequences. The main function is owi_task_run, while others are change the active state. owi_task_run returns the next SPI command to send via SPI for a given moment in time:

package OwiTask is

end OwiTask;

The following Ada code runs the state machine for the OWI robot. Since the target functions are written in C, we see how Ada can interact with legacy C code:

-- OWI sequence run
if usb_state = 16#12670000# then
declare
cmd_bytes : Cmd;
begin
if cmd_word /= -1 then
cmd_bytes(0) := UInt8 (cmd_word and 16#FF#);
cmd_bytes(1) := UInt8 (Shift_Right (cmd_word,  8) and 16#FF#);
cmd_bytes(2) := UInt8 (Shift_Right (cmd_word, 16) and 16#FF#);
ping_timer := CLINT.Machine_Time + PING_TIME;
end if;
end;
end if;

We now port over the owi sequence selection:

declare
Status : Boolean := Ecall_Recv (1, msg);
begin
if Status then
-- OWI sequence select
if usb_state = 16#12670000# then
case msg(0) is
when others => null;
end case;
end if;
...

We leave it as an exercise for the reader to port the OWI sequence handler code to Ada.

We want the LED color to change as a result of the robot connected or disconnected events. Each of the LED colors is a separate pin on the board:

Red_LED   : User_LED renames P01;
Green_LED : User_LED renames P02;
Blue_LED  : User_LED renames P03;

One way of handling this is to create an Access Type that can store the currently selected LED color:

procedure Main is
type User_LED_Select is access all User_LED;
...
LED : User_LED_Select := Red_LED'Access;
...
begin
...
-- Detect USB state every 1sec
if CLINT.Machine_Time > ping_timer then
ping_timer := CLINT.Machine_Time + PING_TIME;
end if;

-- Update USB state
declare
Status : int;
begin
if rx_data /= usb_state then
usb_state := rx_data;

if rx_data = UInt32'(16#12670000#) then
LED := Green_LED'Access;
else
LED := Red_LED'Access;
end if;
end if;
end;

The actual blinking is then just a matter of dereferencing the Access Type and calling the right function:

    -- LED blink
if CLINT.Machine_Time > led_timer then
if GPIO.Set (Red_LED) or GPIO.Set (Green_LED) or GPIO.Set (Blue_LED) then
All_LEDs_Off;
led_timer := CLINT.Machine_Time + LED_OFF_TIME;
else
All_LEDs_Off;
Turn_On (LED.all);
led_timer := CLINT.Machine_Time + LED_ON_TIME;
end if;
end if;

Closing

If you would like to know more about MultiZone and Ada, please reach out to Hex-Five Security - https://hex-five.com/contact-2/. I will also be attending the RISC-V Workshop in Zurich - https://tmt.knect365.com/risc-v-workshop-zurich/ if you would like to grab a coffee and discuss MultiZone and Ada on RISC-V.

]]>

Using SPARK to prove absence of run time errors on a hobby project.

The Danish Technical University has a yearly RoboCup where autonomous vehicles solve a number of challenges. Each solved challenge gives points and the team with most points wins. The track is available for testing two weeks before the qualification round.

The idea behind and creation of RoadRunner for DTU RoboCup 2019.

RoadRunner is a 3D printed robot with wheel suspension, based on the BeagleBone Blue ARM-based board and the Pixy 1 camera with custom firmware enabling real-time line detection. Code is written in Ada and formally proved correct with SPARK at Silver level. SPARK Silver level proves that the code will execute without run-time errors like overflows, race-conditions and deadlocks. During development SPARK prevented numerous hard-to-debug errors to reach the robot. The time-boxed testing period of DTU RoboCup made early error detection even more valuable. In our opinion the estimated time saved on debugging more than outweighs the time we spent on SPARK. The robot may still fail, but it will not be due to run-time errors in our code...

Ada was the obvious choice. We both have years of experience with Ada. The BeagleBone Blue runs Debian Linux and we use the default GNAT FSF compiler version from Debian Testing. It is easy to cross compile from a Debian laptop and it is not too old compared to GNAT Community Edition. The BeagleBone Blue runs Debian Stable.

We decided initially not to use SPARK. According to some internet articles, SPARK had problems proving properties about floating points, and the Ada subset for tasking seemed to be too restricted. Support for floating points has improved since then, and tasking was extended to the Jorvik (extended Ravenscar) profile.

A race condition in the code changed the SPARK decision. A lot of valuable time was spent chasing it. The GNAT runtime on Debian armhf has no traceback info, so it was difficult to find what caused a Storage_Error. SPARK flow analysis detects race conditions, so that would prevent that kind of issues in the future.

It took a lot of effort before SPARK would be able to do flow analysis without errors. SPARK rejects code with exception handlers, fixed-point to floating-point conversions, the standard Ada.Containers and task entries. Hopefully, future versions of SPARK will be able to analyze the above or at least give a warning and continue analysis, even if it contains unsupported Ada features. When the code base is in SPARK, it is quite easy to add new code in SPARK.

Experience with SPARK.

I had to start from scratch, with no previous experience with SPARK, formal methods or safety critical software. I read the online documentation to find examples on how to prove different types of properties.

SPARK is a tough code reviewer. It detects bad design decisions and forces a rewrite. Bad design is impossible to test and prove and SPARK does not miss any of it. Very valuable for small projects without any code review.

SPARK code is easier to read. Defensive code is moved from exception handlers and if statements to preconditions. The resulting code has less branching and is easier to understand and test.

SPARK code is also more difficult to read. Loop invariants and assertions to prove loops can outnumber the Ada statements in the loop.

We used saturation as a shortcut to prove absence of overflows, in locations where some values could not be proved to stay in bounds. But this is not an ideal solution: if it does saturate, then the result is not correct.

In our experience, proving absence of run time errors also detects real bugs, not only division by zero and overflows.

Two examples.

SPARK could not prove a resulting value in range. A scaling factor had been divided with instead of multiplied with.

SPARK could not prove array index in range. X and Y on camera image calculation had been swapped.

Stability.

If code is in SPARK, then all changes must be analyzed by SPARK. Almost all coding error gives a run time exception. That happened several times on the track during testing. But when tested with SPARK, it always found the exact line with the bug. After learning the lesson we found that code analyzed with SPARK always worked the first time.

Having a stable build environment with SPARK and automated tests, made it possible to make successful last-minute changes to the software just before the final run in the video link.

Here is the presentation video for our robot:

and the video of the final winning challenge:

]]>
Using SPARK to prove 255-bit Integer Arithmetic from Curve25519 https://blog.adacore.com/using-spark-to-prove-255-bit-integer-arithmetic-from-curve25519 Tue, 30 Apr 2019 19:00:00 +0000 Joffrey Huguet https://blog.adacore.com/using-spark-to-prove-255-bit-integer-arithmetic-from-curve25519

In 2014, Adam Langley, a well-known cryptographer from Google, wrote a post on his personal blog, in which he tried to prove functions from curve25519-donna, one of his projects, using various verification tools: SPARK, Frama-C, Isabelle... He describes this attempt as "disappointing", because he could not manage to prove "simple" things, like absence of runtime errors. I will show in this blogpost that today, it is possible to prove what he wanted to prove, and even more.

Algorithms in elliptic-curve cryptography compute numbers which are too large to fit into registers of typical CPUs. In this case, curve25519 uses integers ranging from 0 to 2255-19. They can't be represented as native 32- or 64-bit integers. In the original implementations from Daniel J. Bernstein and curve25519-donna, these integers are represented by arrays of 10 smaller, 32-bit integers. Each of these integers is called a "limb". Limbs have alternatively 26-bit and 25-bit length. This forms a little-endian representation of the bigger integer, with low-weight limbs first. In section 4 of this paper, Bernstein explains both the motivation and the details of his implementation. The formula to convert an array A of 32-bit integers into the big integer it represents is: $\sum_{i=0}^{9} A (i) \times 2^{\lceil 25.5i \rceil}$ where the half square brackets represent the ceiling function. I won't be focusing on implementation details here, but you can read about them in the paper above. To me, the most interesting part about this project is its proof.

First steps

Types

The first step was to define the different types used in the implementation and proof.

I'll use two index types. Index_Type, which is range 0 .. 9, is used to index the arrays that represent the 255-bit integers. Product_Index_Type is range 0 .. 18, used to index the output array of the Multiply function.

Integer_Curve25519 represents the arrays that will be used in the implementation, so 64-bit integer arrays with variable bounds in Product_Index_Type.  Product_Integer is the array type of the Multiply result. It is a 64-bit integer array with Product_Index_Type range. Integer_255 is the type of the arrays that represent 255-bit integers, so arrays of 32-bit integers with Index_Type range.

Big integers library

Proving Add and Multiply in SPARK requires reasoning about the big integers represented by the arrays. In SPARK, we don't have mathematical integers (i.e. integers with infinite precision), only bounded integers, which are not sufficient for our use case. If I had tried to prove correctness of Add and Multiply in SPARK with bounded integers, the tool would have triggered overflow checks every time an array is converted to integer.

To overcome this problem, I wrote the specification of a minimal big integers library, that defines the type, relational operators and basic operations. It is a specification only, so it is not executable.

SPARK is based on why, which has support for mathematical integers, and there is a way to link a why3 definition to a SPARK object directly. This is called external axiomatization; you can find more details about how to do it here.

Using this feature, I could easily provide a big integers library with basic functions like "+", "*" or To_Big_Integer (X : Integer). As previously mentioned, this library is usable for proof, but not executable (subprograms are marked as Import).  To avoid issues during binding, I used a feature of SPARK named Ghost code. It takes the form of an aspect: "Ghost" indicates to the compiler that this code cannot affect the values computed by ordinary code, thus can be safely erased. It's very useful for us, since this aspect can be used to write non-executable functions which are only called in annotations.

Conversion function

One of the most important functions used in the proof is a function that converts the array of 10 integers to the big integer it represents, so when X is an Integer_255 then +X is its corresponding big integer.

function "+" (X : Integer_Curve25519) return Big_Integer
renames To_Big_Integer;

After this, I can define To_Big_Integer recursively, with the aid of an auxiliary function, Partial_Conversion.

function Partial_Conversion
(X : Integer_Curve25519;
L : Product_Index_Type)
return Big_Integer
is
(if L = 0
then (+X(0)) * Conversion_Array (0)
else
Partial_Conversion (X, L - 1) + (+X (L)) * Conversion_Array (L))
with
Ghost,
Pre => L in X'Range;

function To_Big_Integer (X : Integer_Curve25519)
return Big_Integer
is
(Partial_Conversion (X, X'Last))
with
Ghost,
Pre => X'Length > 0;

With these functions defined, it was much easier to write specifications of Add and Multiply. All_In_Range is used in preconditions to bound the parameters of Add and Multiply in order to avoid overflow.

function All_In_Range
(X, Y     : Integer_255;
Min, Max : Long_Long_Integer)
return     Boolean
is
(for all J in X'Range =>
X (J) in Min .. Max
and then Y (J) in Min .. Max);

function Add (X, Y : Integer_255) return Integer_255 with
Pre  => All_In_Range (X, Y, -2**30 + 1, 2**30 - 1),
Post => +Add'Result = (+X) + (+Y);

function Multiply (X, Y : Integer_255) return Product_Integer with
Pre  => All_In_Range (X, Y, -2**27 + 1, 2**27 - 1),
Post => +Multiply'Result = (+X) * (+Y);

Proof

Proof of absence of runtime errors was relatively simple: no annotation was added to Add, and a few, very classical loop invariants for Multiply. Multiply is the function where Adam Langley stopped because he couldn't prove absence of overflow. I will focus on proof of functional correctness, which was a much more difficult step. In SPARK, Adam Langley couldn't prove it for Add because of overflow checks triggered by the tool when converting the arrays to the big integers they represent. This is the specific part where the big integers library is useful: it is possible to manipulate big integers in proof without overflow checks.

The method I used to prove both functions is the following:

1.   Create a function that allows tracking the content of the returned array.
2.   Actually track the content of the returned array through loop invariant(s) that ensure equality with the function in point 1.
3.  Prove the equivalence between the loop invariant(s) at the end of the loop and the postcondition in a ghost procedure.
4.  Call the procedure right before return; line.

I will illustrate this example with Add. The final implementation of Add is the following.

function Add (X, Y : Integer_255) return Integer_255 is
Sum : Integer_255 := (others => 0);
begin
for J in Sum'Range loop
Sum (J) := X (J) + Y (J);

pragma Loop_Invariant (for all K in 0 .. J =>
Sum (K) = X (K) + Y (K));
end loop;
return Sum;
end Add;

Points 1 and 2 are ensured by the loop invariant. No new function was created since the content of the array is simple (just an addition). At the end of the loop, the information we have is: "for all K in 0 .. 9 => Sum (K) = X (K) + Y (K)" which is an information on the whole array. Points 3 and 4 are ensured by Prove_Add. Specification of Prove_Add is:

procedure Prove_Add (X, Y, Sum : Integer_255) with
Ghost,
Pre  => (for all J in Sum'Range => Sum (J) = X (J) + Y (J)),
Post => To_Big_Integer (Sum) = To_Big_Integer (X) + To_Big_Integer (Y);

This is what we call a lemma in SPARK. Lemmas are ghost procedures, where preconditions are the hypotheses, and postconditions are the conclusions. Some lemmas are proved automatically, while others require solvers to be guided. You can do this by adding a non-null body to the procedure, and guide them differently, depending on the conclusion you want to prove.

I will talk about the body of Prove_Add in a few paragraphs.

Multiply

Multiply was much harder than Add had been. But I had the most fun proving it. I refer to the implementation in curve25519-donna as the "inlined" one, because it is fully explicit and doesn't go through loops. This causes a problem in the first point of my method: it is not possible to track what the routine does, except by adding assertions at every line of code. To me, it's not a very interesting point of view, proof-wise. So I decided to change the implementation, to make it easier to understand. The most natural approach for multiplication, in my opinion, is to use distributivity of product over addition. The resulting implementation would be similar to TweetNaCl's implementation of curve25519 (see M function):

function Multiply (X, Y : Integer_255) return Product_Integer is
Product : Product_Integer := (others => 0);
begin
for J in Index_Type loop
for K in Index_Type loop
Product (J + K) :=
Product (J + K)
+ X (J) * Y (K) * (if J mod 2 = 1 and then K mod 2 = 1 then 2 else 1);
end loop;
end loop;
return Product;
end Multiply;

Inside the first loop, J is fixed and we iterate over all values of K, so on all the values of Y. It is repeated over the full range of J, so the entire content of X.  With this implementation, it is possible to track the content of the array in loop invariants through an auxiliary function, Partial_Product, that will track the value of a certain index in the array at each iteration. We add the following loop invariant at the end of the loop:

pragma Loop_Invariant (for all L in 0 .. K =>
Product (J + L) = Partial_Product (X, Y, J, L));

The function Partial_Product is defined recursively, because it is a sum of several factors.

function Partial_Product
(X, Y : Integer_255;
J, K : Index_Type)
return Long_Long_Integer
is
(if K = 9 or else J = 0
then (if J mod 2 = 1 and then K mod 2 = 1 then 2 else 1) * X (J) * Y (K)
else Partial_Product (X, Y, J - 1, K + 1)
+ (if J mod 2 = 1 and then K mod 2 = 1 then 2 else 1) * X (J) * Y (K));

As seen in the loop invariant above, the function returns Product (J + K), which is modified for all indexes L and M when J + K = L + M. The recursive call will be on the pair J + 1, K - 1, or J - 1, K + 1. Given how the function is designed in the implementation, the choice J - 1, K + 1 is preferable, because it follows the evolution of Product (J + K). The base case is when J = 0 or K = 9.

Problems and techniques

Defining functions to track content was rather easy to do for both functions. Proving that the content is actually equal to the function was a bit more difficult for the Multiply function, but not as difficult as proving the equivalence between this and the postcondition. With these two functions, we challenge the provers in many ways:

• We use recursive functions in contracts, which means provers have to reason inductively and they struggle with this. That's why we need to /guide/ them in order to prove properties that involve recursive functions.
• The context size is quite big, especially for Multiply. Context represent all variables, axioms, theories that solvers are given to prove a check. The context grows with the size of the code, and sometimes it is too big. When this happens, solvers may be lost and not be able to prove the check. If the context is reduced, it will be easier for solvers, and they may be able to prove the previously unproved checks.
• Multiply contracts need proof related to nonlinear integers arithmetics theory, which provers have a lot of problems with.  This problem is specific to Multiply, but it explains why this function took some time to prove. Solvers need to be guided quite a bit in order to prove certain properties.

What follows is a collection of techniques I found interesting and might be useful when trying to prove problems, even very different from this one.

Axiom instantiation

When I tried to prove loop invariants in Multiply to track the value of Product (J + K), solvers were unable to use the definition of Partial_Product. Even asserting the exact same definition failed to be proven. This is mainly due to context size: solvers are not able to find the axiom in the search space, and the proof fails. The workaround I found is to create a Ghost procedure which has the definition as its postcondition, and a null body, like this:

procedure Partial_Product_Def
(X, Y : Integer_255;
J, K : Index_Type)
with
Pre  => All_In_Range (X, Y, Min_Multiply, Max_Multiply),
Post =>
(if K = Index_Type'Min (9, J + K)
then Partial_Product (X, Y, J, K)
= (if J mod 2 = 1 and then K mod 2 = 1 then 2 else 1)
* X (J) * Y (K)
else Partial_Product (X, Y, J, K)
= Partial_Product (X, Y, J - 1, K + 1)
+ X (J) * Y (K)
* (if J mod 2 = 1 and then K mod 2 = 1 then 2 else 1));
procedure Partial_Product_Def
(X, Y : Integer_255;
J, K : Index_Type)
is null;

In this case, the context size is reduced considerably, and provers are able to prove it without any body. During the proof process, I had to create other recursive functions that needed this *_Def procedure in order to use their definition. It has to be instantiated manually, but it is a way to "remind" solvers this axiom. A simple but very useful technique.

Manual induction

When trying to prove Prove_Add, I encountered one of the cases where the provers have to reason inductively: a finite sum is defined recursively. To help them prove the postcondition, the conversion is computed incrementally in a loop, and the loop invariant tracks the evolution.

procedure Prove_Add (X, Y, Sum : Integer_255) with
Ghost,
Pre  => (for all J in Sum'Range => Sum (J) = X (J) + Y (J)),
Post => To_Big_Integer (Sum) = To_Big_Integer (X) + To_Big_Integer (Y);
--  Just to remember

procedure Prove_Add (X, Y, Sum : Integer_255) with
Ghost
is
X_255, Y_255, Sum_255 : Big_Integer := Zero;
begin
for J in Sum'Range loop

X_255 := X_255 + (+X (J)) * Conversion_Array (J);
Y_255 := Y_255 + (+Y (J)) * Conversion_Array (J);
Sum_255 := Sum_255 + (+Sum (J)) * Conversion_Array (J);

pragma Loop_Invariant (X_255 = Partial_Conversion (X, J));
pragma Loop_Invariant (Y_255 = Partial_Conversion (Y, J));
pragma Loop_Invariant (Sum_255 = Partial_Conversion (Sum, J));
pragma Loop_Invariant (Partial_Conversion (Sum, J) =
Partial_Conversion (X, J) +
Partial_Conversion (Y, J));
end loop;
end Prove_Add;

What makes this proof inductive is the treatment of Loop_Invariants by SPARK: it will first try to prove the first iteration, as an initialization, and then it will try to prove iteration N knowing that the property is true for N - 1.

Here, the final loop invariant is what we want to prove, because when J = 9, it is equivalent to the postcondition. The other loop invariants and variables are used to compute incrementally the values of partial conversions and facilitate proof.

Fortunately, in this case, provers do not need more guidance to prove the postcondition.  Prove_Multiply also follows this proof scheme, but is much more difficult. You can access its proof following this link.

Another case of inductive reasoning in my project is a lemma, whose proof present a very useful technique when proving algorithms using recursive ghost functions in contracts.

procedure Equal_To_Conversion
(A, B : Integer_Curve25519;
L    : Product_Index_Type)
with
Pre  =>
A'Length > 0
and then A'First = 0
and then B'First = 0
and then B'Last <= A'Last
and then L in B'Range
and then (for all J in 0 .. L => A (J) = B (J)),
Post => Partial_Conversion (A, L) = Partial_Conversion (B, L);

It states that given two Integer_Curve25519, A and B, and a Product_Index_Type, L, if A (0 .. L) = B (0 .. L), then Partial_Conversion (A, L) = Partial_Conversion (B, L). The proof for us is evident because it can be proved by induction over L, but we have to help SPARK a bit, even though it's usually simple.

procedure Equal_To_Conversion
(A, B : Integer_Curve25519;
L    : Product_Index_Type)
is
begin
if L = 0 then
return;                         --  Initialization of lemma
end if;
Equal_To_Conversion (A, B, L - 1); --  Calling lemma for L - 1
end Equal_To_Conversion;

The body of a procedure proved by induction actually looks like induction: first thing to add is initialization, with an if statement ending with a return;. Then, the following code is the general case. We call the same lemma for L - 1, and we add assertions to prove postcondition if necessary. In this case, calling the lemma at L - 1 was sufficient to prove the postcondition.

Guide with assertions

Even if the solvers know the relation between Conversion_Array (J + K) and Conversion_Array (J) * Conversion_Array (K), it is hard for them to prove properties requiring non-linear arithmetic reasoning. The following procedure is a nice example:

procedure Split_Product
(Old_Product, Old_X, Product_Conversion : Big_Integer;
X, Y                                   : Integer_255;
J, K                                   : Index_Type)
with
Ghost,
Pre  =>
Old_Product
= Old_X * (+Y)
+ (+X (J))
* (if K = 0
then Zero
else Conversion_Array (J) * Partial_Conversion (Y, K - 1))
and then
Product_Conversion
= Old_Product
+ (+X (J)) * (+Y (K))
* (+(if J mod 2 = 1 and then K mod 2 = 1 then 2 else 1))
* Conversion_Array (J + K),
Post =>
Product_Conversion
= Old_X * (+Y)
+ (+X (J)) * Conversion_Array (J)
* Partial_Conversion (Y, K);

The preconditions imply the postcondition by arithmetic reasoning. With a null body, the procedure is not proved. We can guide provers through assertions, by splitting the proof into two different cases:

procedure Split_Product
(Old_Product, Old_X, Product_Conversion : Big_Integer;
X, Y                                   : Integer_255;
J, K                                   : Index_Type)
is
begin
if J mod 2 = 1 and then K mod 2 = 1 then
pragma Assert (Product_Conversion
= Old_Product
+ (+X (J)) * (+Y (K))
* Conversion_Array (J + K) * (+2));
pragma Assert ((+2) * Conversion_Array (J + K)
= Conversion_Array (J) * Conversion_Array (K));
--  Case where Conversion_Array (J + K) * 2
--             = Conversion_Array (J) * Conversion_Array (K).
else
pragma Assert (Conversion_Array (J + K)
= Conversion_Array (J) * Conversion_Array (K));
--  Other case
end if;
if K > 0 then
pragma Assert (Partial_Conversion (Y, K)
= Partial_Conversion (Y, K - 1)
+ (+Y (K)) * Conversion_Array (K));
--  Definition of Partial_Conversion, needed for proof
end if;
end Split_Product;

What is interesting in this technique is that it shows that to prove something with auto-active proof, you need to understand it yourself first. I proved this lemma by hand on paper, which was not difficult, and then I rewrote my manual proof through assertions. If my manual proof is easier when I split two different cases, so it should be for provers. It allows them to choose the first /good/ steps to the proof.

Split into subprograms

Another thing I have noticed, is that provers were really quickly overwhelmed by context size in my project. I had a lot of recursive functions in my contracts, but also quantifiers... I did not hesitate to split proofs into Ghost procedures in order to reduce context size, but also wrap some expressions into functions.

I have 7 subprograms that enable me to prove Prove_Multiply, which is not a lot in my opinion. It also increases readability, which is important if you want other people to read your code.

There is also another method I want to share to overcome this problem. When there is only one assertion to prove, but it requires a lot of guidance, it is possible to put all assertions, including the last one in a begin end block, and the last one has a Assert_And_Cut pragma, just like this:

code ...
--  It is possible to add the keyword declare to declare variables
--  before the block, if it helps for proof.
begin
pragma Assert (one assert needed to prove the last assertion);
pragma Assert (another one);
pragma Assert_And_Cut (Assertion to prove and remember);
end;

Assert_And_Cut asks the provers to prove the property inside them, but also will remove all context that has been created in the begin end; block. It will keep just this assertion, and might help to reduce context.

Sadly, this workaround didn't work for my project, because context was already too big to prove the first assertions. But adding a lot of subprograms also has drawbacks, e.g. you have to write preconditions to the procedures, and this may be more difficult than just writing the block with an Assert_And_Cut. Surely, both are useful in various projects, and I think it's nice to have these methods in mind.

Takeaways

For statistics only: Add has a 8-line implementation for around 25 lines of Ghost code, to prove it. Multiply has 12 lines of implementation for more than 500 lines of proof. And it's done with auto-active proof, which means that verification was all done automatically! In comparison, verification of TweetNaCl's curve25519 has more than 700 lines of Coq in order to prove Multiply for an implementation of 5 lines, but they have a carry operation to prove on top of the product.

I would say this challenge is at an intermediary level, because it is not difficult to understand the proof of Multiply when you write it down. But it presents several techniques that can apply to various other problems, and this is the main interest of the project to me.

As done in the blogpost by Adam Langley, I tried to prove my project with alt-ergo only, because it was the only prover available for SPARK back in 2014. Even today, alt-ergo alone is unable to prove all code. However, it doesn't make alt-ergo a bad prover, in fact, none of the provers available in SPARK (CVC4, Z3 and Alt-Ergo) are able to prove the entire project on their own. I think it shows that having multiple provers available increases greatly chances of code to be proved.

At the beginning, working on this project was just a way to use my big integers library in proof. But in the end, I believe it is an interesting take on the challenge of verifying elliptic curves functions, especially when projects like this appear for example with fiat-crypto or verification of TweetNaCl's curve25519, and I had a lot of fun experimenting with SPARK to prove properties that usually are badly handled by provers. You can access full proof of my project in this repository.

]]>

This course is geared to software professionals looking for a practical introduction to the Ada language with a focus on embedded systems, including real-time features as well as critical features introduced in Ada 2012. By attending this course you will understand and know how to use Ada for both sequential and concurrent applications, through a combination of live lectures from AdaCore's expert instructors and hands-on workshops using AdaCore's latest GNAT technology. AdaCore will provide an Ada 2012 tool-chain and ARM-based target boards for embedded workshops. No previous experience with Ada is required.

The course will be conducted in English.

Prerequisite: Knowledge of a programming language (Ada 83, C, C++, Java…)

Each participant should come with a computer running Windows.

]]>

A question that our users sometimes ask us is "do you use CodePeer at AdaCore and if so, how?". The answer is yes! and this blog post will hopefully give you some insights into how we are doing it for our own needs.

First, I should note that at AdaCore we are in a special situation since we are both developing lots of Ada code, and at the same time we are also developing and evolving CodePeer itself. One of the consequences is that using CodePeer on our own code has actually two purposes: one is to improve the quality of our code by using an advanced static analyzer, and the other is to eat our own dog food and improve the analyzer itself by finding limitations, sub-optimal usage, etc...

In the past few years, we've gradually added automated runs of CodePeer on many of our Ada code base, in addition to the systematic use of light static analysis performed by the compiler: GNAT already provides many useful built-in checks such as all the static Ada legality rules, as well as many clever warnings fine tuned over the past 25 years and available via the -gnatwa compiler switch and coupled with -gnatwe that transforms warnings into errors, not mentioning also the built-in style checks available via the -gnaty switch.

GNAT

For GNAT sources (Ada front-end and runtime source code) given that this is a large and complex piece of Ada code (compilers do come with a high level of algorithmic complexity and code recursion!) and that we wanted to favor rapid and regular feedback, we've settled on running CodePeer at level 1 with some extra fine tuning: we've found that some categories of messages didn't generate any extra value for the kind of code found in GNAT, so we've disabled these categories via the --be-messages switch. We've also excluded some files from analysis which were taking a long time compared to other files, for little benefits, via the Excluded_Source_Files project attribute as explained in the Partial Analysis section of the User's Guide.

Given that the default build mechanism of GNAT is based on Makefiles, we've written a separate project file for performing the CodePeer analysis, coupled with a separate simple Makefile that performs the necessary setup phase (automatic generation of some source files in the case of GNAT sources).

After this fine tuning, CodePeer generated a few dozen useful messages that we analyzed and allowed us to fix potential issues, as well as improve the general quality of the code. You'll find a few examples below. The most useful findings on the GNAT code base are related to redundant or dead code and potential access to uninitialized variables when complex conditions are involved, which is often the case in a compiler! CodePeer also detected useful cases of code cleanups in particular related to wrong parameter modes: in out parameters that should be out or inout parameters that should be in out.

Once we've addressed these findings by improving the source code, we ended up in a nice situation where the CodePeer run was completely clean: no new messages found. So we've decided on a strict policy similar to our no warning policy: no CodePeer messages should be left alone moving forward. To ensure that, we've put in place a continuous run that triggers after each commit in the GNAT repository and reports back its findings within half an hour.

One of the most useful categories of potential issues found by CodePeer in our case is related to not always initializing a variable before using it (what CodePeer calls validity checks). For example in gnatchop.adb we had the following code:

   Offset_FD : File_Descriptor;
[...]

exception
when Failure | Types.Terminate_Program =>
Close (Offset_FD);
[...]

CodePeer detected that at line 6 Offset_FD may not have been initialized (precisely because an exception may have been raised before assigning Offset_FD. We fixed this potential issue by explicitly assigning a default value and testing for it:

   Offset_FD : File_Descriptor := Invalid_FD;
[...]

exception
when Failure | Types.Terminate_Program =>
if Offset_FD /= Invalid_FD then
Close (Offset_FD);
end if;
[...]

CodePeer also helped detect suspicious or dead code that should not have been there in the first place. Here is an example of code smell that CodePeer detected in the file get_scos.adb:

procedure Skip_EOL is
C : Character;
begin
loop
Skipc;
C := Nextc;
exit when C /= LF and then C /= CR;

if C = ' ' then
Skip_Spaces;
C := Nextc;
exit when C /= LF and then C /= CR;
end if;
end loop;
end Skip_EOL;

in the code above, CodePeer complained that the test at line 9 is always false because C is either LF or CR. And indeed, if you look closely at line 7, when the code goes past this line, then C will always be either CR or LF, and therefore cannot be a space character. The code was simplified into:

   procedure Skip_EOL is
C : Character;
begin
loop
Skipc;
C := Nextc;
exit when C /= LF and then C /= CR;
end loop;
end Skip_EOL;

GPS

Analysis of GPS sources with CodePeer is used at AdaCore both for improving the code quality and also to test our integration with the SonarQube tool via our GNATdashboard integration.

For GPS, we are using CodePeer in two modes: one mode where developers can manually run CodePeer on their local set up at level 0. This mode runs in about 3 to 5 minutes to analyze all the GPS sources and runs the first level of checks provided by CodePeer, which is a very useful complement to the compiler warnings and style checks and allows to stay 100% clean of messages after an initial set of code cleanups.

In addition, an automated run is performed nightly on a server using level 1, further tuned in a similar way to what we did for GNAT. Here we have some remaining messages under analysis and we use SonarQube to track and analyze these messages.

Here is an example of code that looked suspicious in GPS sources:

   View : constant Registers_View := Get_Or_Create_View (...);
begin
View.Locked := True;

if View /= null then
[...]
end if;

CodePeer complained at line 5 that the test is always True since View cannot be null at this point. Why? Because at line 3 we are already dereferencing View, so CodePeer knows that after this point either an exception was raised or, if not, View cannot be null anymore.

In this case, we've replaced the test by an explicit assertion since it appears that Get_Or_Create_View can never return null:

   View : constant Registers_View := Get_Or_Create_View (...);
begin
pragma Assert (View /= null);
View.Locked := True;
[...]

Run Time Certification

As part of a certification project of one of our embedded runtimes for a bare metal target, we ran CodePeer at its highest level (4) in order to detect all potentially cases of a number of vulnerabilities and in particular: validity checks, divide by zero, overflow checks, as well as confirming that the runtime did not contain dead code or unused assignments. CodePeer was run manually and then all messages produced were reviewed and justified, as part of our certification work.

GNAT LLVM

As part of the GNAT LLVM project (more details on this project if you're curious in a future post!) in its early stages, we ran CodePeer manually on all the GNAT LLVM sources - excluding the sources common to GNAT, already analyzed separately - initially at level 1 and then at level 3, and we concentrated on analyzing all (and only) the potential uninitialized variables (validity checks). In this case we use the same project file used to build GNAT LLVM itself and added the CodePeer settings which basically looked like:

package CodePeer is
for Switches use ("-level", "3", "--be-messages=validity_check", "--no-lal-checkers");
for Excluded_Source_Dirs use ("gnat_src", "obj");
end CodePeer;

which allowed us to perform a number of code cleanups and review more closely the code pointed by CodePeer.

CodePeer

Last but not least, we also run CodePeer on its own code base! For CodePeer given that this is another large and complex piece of Ada code and we wanted to favor rapid and regular feedback, we've settled on a setting similar to GNAT: level 1 with some extra fine tuning via --be-messages. We also added a few pragma Annotate to both justify some messages - pragma Annotate (CodePeer, False_Positive) - as well as skipping analysis of some subprograms or files where CodePeer was taking too long to analyze, for little benefits (via either the Excluded_Source_Files project attribute or pragma Annotate (CodePeer, Skip_Analysis)).

A CodePeer run is triggered after each change in the repository in a continuous builder and made available to the team within 30 minutes. We've found that in this case the most interesting messages where: validity checks on local variables and out parameters, test always true/false, duplicated code and potential wrong parameter mode.

We also run CodePeer on other code bases in a similar fashion, such as analysis of the SPARK tool sources.

As part of integrating CodePeer in our daily work, we also took the opportunity to improve the documentation and describe many possible workflows corresponding to the various needs of teams wanting to analyze Ada code, with an explanation on how to put these various scenarios in place, check Chapter 5 of the User's Guide if you're curious.

What about you? Do not hesitate to tell us how you are using CodePeer and what are the most useful benefits you are getting by commenting below.

]]>
Ten Years of Using SPARK to Build CubeSat Nano Satellites With Students https://blog.adacore.com/ten-years-of-using-spark-to-build-cubesat-nano-satellites-with-students Fri, 01 Mar 2019 19:31:01 +0000 Peter Chapin https://blog.adacore.com/ten-years-of-using-spark-to-build-cubesat-nano-satellites-with-students

My colleague, Carl Brandon, and I have been running the CubeSat Laboratory at Vermont Technical College (VTC) for over ten years. During that time we have worked with nearly two dozen students on building and programming CubeSat nano satellites. CubeSats are small (usually 10cm cube), easily launched spacecraft that can be outfitted with a variety of cameras, sensing instruments, and communications equipment. Many CubeSats are built by university groups like ours using students at various skill levels in the design and production process.

Students working in the CubeSat Laboratory at VTC have been drawn from various disciplines including computer engineering, electro-mechanical engineering, electrical engineering, and software engineering. VTC offers a masters degree in software engineering, and two of our MSSE students have completed masters projects related to CubeSat flight software. In fact, our current focus is on building a general purpose flight software framework called CubedOS.

Like all spacecraft CubeSats are difficult to service once they are launched. Because of the limited financial resources available to university groups, and because of on-board resource constraints, CubeSats typically don't support after-launch uploading of software updates. This means the software must be fully functional and fault-free, with no possibility of being updated, at the time of launch

Many university CubeSat missions have failed due to software errors. This is not surprising considering that most flight software is written in C, a language that is difficult to use correctly. To mitigate this problem we use the SPARK dialect of Ada in all of our software work. Using the SPARK tools we work toward proving the software free of runtime error, meaning that no runtime exceptions will occur. However, in general we have not attempted to prove functional correctness properties, relying instead on conventional testing for that level of verification.

Although we do have some graduate students working in the CubeSat Laboratory, most of our students are third and fourth year undergraduates. The standard curriculum for VTC's software engineering program includes primarily the Java and C languages. Although I have taught Ada in the classroom, it has only been in specialized classes that a limited number of students take. As a result most of the students who come to the CubeSat Laboratory have no prior knowledge of SPARK or Ada, although they have usually taken several programming courses in other languages.

Normally I arrange to give new students an intensive SPARK training course that spans three or four days, with time for them to do some general exercises. After that I assign the students a relatively simple introductory task involving our codebase so they can get used to the syntax and semantics of SPARK without the distraction of complex programming problems. Our experience has been that good undergraduate students with a reasonable programming background can become usefully productive with SPARK in as little as two weeks. Of course, it takes longer for them to gain the skills and experience to tackle the more difficult problems, but the concern commonly expressed by some that the lack of SPARK skills in a programming team is a barrier to the adoption of SPARK is not validated by our experience

Students are, of course, novice programmers almost by definition. Many of our students are in the process of learning basic software engineering principles such as the importance of requirements, code review, testing, version control, continuous integration, and many other things. Seeing these ideas in the context of our CubeSat work gives them an important measure of realism that can be lacking in traditional courses.

However, because of their general inexperience, and because of the high student turnover rate that is natural in an educational setting, our development process is often far from ideal. Here SPARK has been extremely valuable to us. What we lack in rigor of the development process we make up for in the rigor of the SPARK language and tools. For example, if a student refactors some code, perhaps without adequately communicating with the authors of the adjoining code, the SPARK tools will often find inconsistencies that would otherwise go unnoticed. This has resulted in a much more disciplined progression of the code than one would expect based on the team's overall culture. Moreover the discipline imposed by SPARK and its tools can serve to educate the students about the kinds of issues that can go wrong by following an overly informal approach to development.

For example, one component of CubedOS is a module that sends “tick” messages to other modules on request. These messages are largely intended to trigger slow, non-timing critical housekeeping tasks in the other modules. For flexibility the module supports both one-shot tick messages that occur only once and periodic tick messages. Many “series” of messages can be active at once. The module can send periodic tick messages with different periods to several receivers with additional one-shot tick messages waiting to fire as well.

The module has been reworked many times. At one point it was decided that tick messages should contain a counter that represents the number of such messages that have been sent in the series. Procedure Next_Ticks below is called at a relatively high frequency to scan the list of active series and issue tick messages as appropriate. I asked a new student to add support for the counters to this system as a simple way of getting into our code and helping us to make forward progress. The version produced was something like:

      procedure Next_Ticks is
begin
-- Iterate through the array to see who needs a tick message.
for I in Series_Array'Range loop
declare
Current_Series : Series_Record renames Series_Array(I);
begin
-- If we need to send a tick from this series...
if Current_Series.Is_Used and then Current_Series.Next <= Current_Time then
Route_Message
Request_ID => 0,
Series_ID  => Current_Series.ID,
Count      => Current_Series.Count));

-- Update the current record.
case Current_Series.Kind is
when One_Shot =>
-- TODO: Should we reinitialize the rest of the series record?
Current_Series.Is_Used := False;

when Periodic =>
Current_Series.Count := Current_Series.Count + 1;
Current_Series.Next :=
Current_Series.Next + Current_Series.Interval;
end case;
end if;
end;
end loop;
end Next_Ticks;

The code iterates over an array of Series_Record objects looking for the records that are active and are due to be ticked (Current_Series.Next <= Current_Time). For those records, it invokes the Route_Message procedure on an appropriately filled in Tick Reply message as created by function Tick_Reply_Encode. The student had to modify the information saved for each series to contain a counter, modify the Tick Reply message to contain a count, and update Tick_Reply_Encode to allow for a count parameter. The student also needed to increment the count as needed for periodic messages. This involved relatively minor changes to a variety of areas in the code.

The SPARK tools ensured that the student dealt with initialization issues correctly. But they also found an issue in the code above with surprisingly far reaching consequences. In particular, SPARK could not prove that no runtime error could ever occur on the line

    Current_Series.Count := Current_Series.Count + 1;

Of course, this will raise Constraint_Error when the count overflows the range of the data type used. If periodic tick messages are sent for too long, the exception would eventually be raised, potentially crashing the system. Depending on the rate of tick messages, that might occur in as little as a couple of years, a very realistic amount of time for the duration of a space mission. It was also easy to see how the error could be missed during ordinary testing. A code review might have discovered the problem, but we did code reviews infrequently. On the other hand, SPARK made this problem immediately obvious and led to an extensive discussion about what should be done in the event of the tick counters overflowing.

In November 2013 we launched a low Earth orbiting CubeSat. The launch vehicle contained 13 other university built CubeSats. Most were never heard from. One worked for a few months. Ours worked for two years until it reentered Earth's atmosphere as planned in November 2015. Although the reasons for the other failures are not always clear, software problems were known to be an issue in at least one of them and probably for many others. We believe the success of our mission, particularly in light of the small size and experience of our student team, is directly attributable to the use of SPARK.

]]>

MISRA C is the most widely known coding standard restricting the use of the C programming language for critical software. For good reasons. For one, its focus is entirely on avoiding error-prone programming features of the C programming language rather than on enforcing a particular programming style. In addition, a large majority of rules it defines are checkable automatically (116 rules out of the total 159 guidelines), and many tools are available to enforce those. As a coding standard, MISRA C even goes out of its way to define a consistent sub-language of C, with its own typing rules (called the "essential type model" in MISRA C) to make up for the lack of strong typing in C.

That being said, it's still a long shot to call it a security coding standard for C. Even when taking into account the 14 additional guidelines focusing on security of the "MISRA C:2012 - Amendment 1: Additional security guidelines for MISRA C:2012". MISRA C is first and foremost focusing on software quality, which has obvious consequences for security, but programs in MISRA C remain for the most part vulnerable to the major security vulnerabilities that plague C programs.

In particular, it's hard to state what guarantees are obtained when respecting the MISRA C rules (which means essentially respecting the 116 decidable rules enforced automatically by analysis tools). In order to clarify this, and to present at the same time how guarantees can be obtained using a different programming language, we have written a book available online. Even better, we host on our e-learning website an interactive version of the book where you can compile C or Ada code, and analyze SPARK code, to experiment how a different language with its associated analysis toolset can go beyond what MISRA C allows.

So that, even if MISRA C is the best thing that could happen to C, you can decide if C is really the best thing that could happen to your software.

]]>

Like last year, we've sent a squad of AdaCore engineers to participate in the celebration of Open Source software at FOSDEM. Like last year, we had great interactions with the rest of the Ada and SPARK Community in the Ada devroom on Saturday. You can check the program with videos of all the talks here. This year's edition was particularly diverse, with an academic project from Austria for an autonomous train control in Ada, two talks on Database development and Web development made type-safe with Ada, distributed computing, libraries, C++ binding, concurrency, safe pointers, etc.

We also had a talk in the RISC-V devroom:

And there was a related talk in the Security devroom on the use of SPARK for security:

Hope to see you at FOSDEM next year!

]]>

In Part 1 of this blog post I discussed why I chose to implement this application using the Ada Web Server to serve the computed fractal to a web browser. In this part I will discuss a bit more about the backend of the application, the Ada part.

Why do we care about performance?

The Ada backend will compute a new image each time one is requested from the front-end. The front-end immediately requests a new image right after it receives the last one it requested. Ideally, this will present to the user as an animation of a Mandelbrot fractal changing colors. If the update is too slow, the animation will look slow and terrible. So we want to minimize the compute time as much as possible. We have a few ways to do that: optimize the computation, and parallelize the computation.

Parallelizing the Fractal Computation

An interesting feature of the Mandelbrot calculation is that the computation of any pixel is independent of any other pixel. That means we can completely parallelize the computation of each pixel! So if our requested image is 1920x1280 pixels we can spawn 2457600 tasks right? Theoretically, yes we can do that. But it doesn’t necessarily speed up our application compared to, let’s say, 8 or 16 tasks, each of which computes a row or selection of rows. Either way, we know we need to create what’s called a task pool, or a group of tasks that can be queued up as needed to do some calculations. We will create our task pool by creating a task type, which will implement the actual activity that the task will complete, and we will use a Synchronous_Barrier to synchronize all of the tasks back together.

task type Chunk_Task_Type is
pragma Priority (0);
entry Go (Start_Row : Natural;
Stop_Row  : Natural;
Buf       : Stream_Element_Array_Access);

Start_Row : Natural;
Stop_Row  : Natural;
end record;

S_Sync_Obj  : Synchronous_Barrier (Release_Threshold => Task_Pool_Size + 1);

This snippet is from fractal.ads. S_Task_Pool will be our task pool and S_Sync_Obj will be our synchronization object. If you notice, each Chunk_Task_Type in the S_Task_Pool takes an access to a buffer in its Go entry procedure. In our implementation, each task will have an access to the same buffer. Isn’t this a race condition? Shouldn’t we use a protected object?

The downside of protected object

The answer to both of those questions, is yes. This is a race condition and we should be using a protected object. If you were to run CodePeer on this project, it identifies this as a definite problem. But using a protected object is going to destroy our performance. The reason for this is because under the hood, the protected object will be using locks each time we access data from the buffer. Each lock, unlock, and wait-on-lock call is going to make the animation of our fractal look worse and worse. However, there is a way to get around this race-condition issue. By design, we can guarantee that each task is going to access the buffer from Start_Row to Stop_Row. So, by design we can make sure that each task’s rows don’t overlap another task’s rows thereby avoiding a race condition.

Parallelization Implementation

Now that we understand the specification of the task pool, let’s look at the implementation.

task body Chunk_Task_Type
is
Start   : Natural;
Stop    : Natural;
Buffer  : Stream_Element_Array_Access;
Notified : Boolean;
begin

loop
accept Go (Start_Row : Natural;
Stop_Row : Natural;
Buf : Stream_Element_Array_Access) do
Start := Start_Row;
Stop := Stop_Row;
Buffer := Buf;
end Go;

for I in Start .. Stop loop

Calculate_Row (Y      => I,
Idx    => Buffer'First +
Stream_Element_Offset ((I - 1) *
Get_Width * Pixel'Size / 8),
Buffer => Buffer);
end loop;
Wait_For_Release (The_Barrier => S_Sync_Obj,
Notified    =>  Notified);
end loop;
end Chunk_Task_Type;

Now we come back to the size of the task pool. Based on the implementation above we can chunk the processing up to the granularity of at least the number of rows being requested. So in the case of a 1920x1280 image, we could have 1280 tasks! But we have to ask ourselves, is that going to give us better performance than 8 or 16 tasks? The answer, unfortunately, is probably not. If we create 8 tasks, and we have an 8 core processor, we can assume that some of our tasks are going to execute on different cores in parallel. If we create 1280 tasks and we use the same 8 core processor, we don’t get much more parallelization than with 8 tasks. This is a place where tuning and best judgement will give you the best performance.

Fixed vs Floating Point

Now that we have the parallelization component, let’s think about optimizing the maths. In most fractal computations floating point complex numbers are used. Based on our knowledge of processors, we can assume that in most cases floating point calculations will be slower than integer calculations. So theoretically, using fixed point numbers might give us better performance. For more information on Ada fixed point types check out the Fixed-point types section of the Introduction to Ada course on learn.adacore.com .

The Generic Real Type

Because we are going to use the same algorithm for both floating and fixed point math, we can implement the algorithm using a generic type called Real. The Real type is defined in computation_type.ads.

generic
type Real is private;
with function "*" (Left, Right : Real) return Real is <>;
with function "/" (Left, Right : Real) return Real is <>;
with function To_Real (V : Integer) return Real is <>;
with function F_To_Real (V : Float) return Real is <>;
with function To_Integer (V : Real) return Integer is <>;
with function To_Float (V : Real) return Float is <>;
with function Image (V : Real) return String is <>;
with function "+" (Left, Right : Real) return Real is <>;
with function "-" (Left, Right : Real) return Real is <>;
with function ">" (Left, Right : Real) return Boolean is <>;
with function "<" (Left, Right : Real) return Boolean is <>;
with function "<=" (Left, Right : Real) return Boolean is <>;
with function ">=" (Left, Right : Real) return Boolean is <>;
package Computation_Type is

end Computation_Type;

We can then create instances of the Julia_Set package using a floating point and fixed point version of the computation_type package.

type Real_Float is new Float;

function Integer_To_Float (V : Integer) return Real_Float is
(Real_Float (V));

function Float_To_Integer (V : Real_Float) return Integer is
(Natural (V));

function Float_To_Real_Float (V : Float) return Real_Float is
(Real_Float (V));

function Real_Float_To_Float (V : Real_Float) return Float is
(Float (V));

function Float_Image (V : Real_Float) return String is
(V'Img);

D_Small : constant := 2.0 ** (-21);
type Real_Fixed is delta D_Small range -100.0 .. 201.0 - D_Small;

function "*" (Left, Right : Real_Fixed) return Real_Fixed;
pragma Import (Intrinsic, "*");

function "/" (Left, Right : Real_Fixed) return Real_Fixed;
pragma Import (Intrinsic, "/");

function Integer_To_Fixed (V : Integer) return Real_Fixed is
(Real_Fixed (V));

function Float_To_Fixed (V : Float) return Real_Fixed is
(Real_Fixed (V));

function Fixed_To_Float (V : Real_Fixed) return Float is
(Float (V));

function Fixed_To_Integer (V : Real_Fixed) return Integer is
(Natural (V));

function Fixed_Image (V : Real_Fixed) return String is
(V'Img);

package Fixed_Computation is new Computation_Type (Real       => Real_Fixed,
"*"        => Router_Cb."*",
"/"        => Router_Cb."/",
To_Real    => Integer_To_Fixed,
F_To_Real  => Float_To_Fixed,
To_Integer => Fixed_To_Integer,
To_Float   => Fixed_To_Float,
Image      => Fixed_Image);

package Fixed_Julia is new Julia_Set (CT               => Fixed_Computation,
Escape_Threshold => 100.0);

package Fixed_Julia_Fractal is new Fractal (CT              => Fixed_Computation,
Calculate_Pixel => Fixed_Julia.Calculate_Pixel,

package Float_Computation is new Computation_Type (Real       => Real_Float,
To_Real    => Integer_To_Float,
F_To_Real  => Float_To_Real_Float,
To_Integer => Float_To_Integer,
To_Float   => Real_Float_To_Float,
Image      => Float_Image);

package Float_Julia is new Julia_Set (CT               => Float_Computation,
Escape_Threshold => 100.0);

package Float_Julia_Fractal is new Fractal (CT              => Float_Computation,
Calculate_Pixel => Float_Julia.Calculate_Pixel,
Task_Pool_Size  => Task_Pool_Size);

We now have the Julia_Set package with both the floating point and fixed point implementations. The AWS URI router is set up to serve a floating point image if the GET request URI is “/floating_fractal” and a fixed point image is the request URI is “/fixed_fractal”.

The performance results

Interestingly, fixed point was not unequivocally faster than floating point in every situation. On my 64 bit mac, the floating point was slightly faster. On a 64 bit ARM running QNX the fixed point was faster. Another phenomenon I noticed is that the fixed point was less precise with little performance gain in most cases. When running the fixed point algorithm, you will notice what looks like dust in the image. That is integer overflow or underflow instances on the pixel. Under normal operation, these would manifest as runtime exceptions in the application, but to increase performance, I compiled with those checks turned off.

Here’s the takeaway from this project: fixed point math performance is only better if you need lower precision OR a limited range of values. Keep in mind that this is an OR situation. You can either have a precise fixed point number with a small range, or a imprecise fixed point number with a larger range. If you try to have a precise fixed point number with a large range, the underlying integer type that is used will be quite large. If the type requires a 128 bit or larger integer type, then you lose all performance you would have gained by using fixed point in the first place.

In the case of the fractal, the fixed point is useless because we need a high precision with a large range. In this respect, we are stuck with floating point, or weird looking dusty fixed point fractals.

Conclusion

Although this was an interesting exercise to compare the two types of performances in a pure math situation, it isn’t entirely meaningful. If this was a production application, we would be optimizing the algorithms for the types being used. In this case, the algorithm was very generic and didn’t account for the limited range or precision that would be necessary for an optimized fixed point type. This is why the floating point performance was comparable in most cases to the fixed point with better visual results.

However, there are a few important design paradigms that were interesting to implement. The task pool is a useful concept for any parallelized application such as this. And the polymorphism via generics that we used to create the Real type can be extremely useful to give you greater flexibility to have multiple build or feature configurations.

With Object Oriented programming languages like Ada and C++ it is sometimes tempting to start designing an application like this using base classes and derivation to implement new functionality. In this application that wouldn’t have made sense because we don’t need dynamic polymorphism; everything is known at compile time. So instead, we can use generics to achieve the polymorphism without creating the overhead associated with tagged types and classes.

]]>

The AdaFractal project is exactly what it sounds like; fractals generated using Ada. The code, which can be found here , calculates a Mandelbrot set using a fairly common calculation. Nothing fancy going on here. However, how are we going to display the fractal once its calculated?

To Graphics Library or not to Graphics Library?

We could use GTKAda, the Ada binding to GTK, to display a window… But that means we are stuck with GTK which really is overkill for displaying our little image, and is only supported on Windows and Linux... And Solaris if you’re into that sort of thing...

What’s more portable and easier to handle than GTK? Well, doing a GUI in web based technology is very easy. With a few lines of HTML, some Javascript, and some simple CSS we can throw together an entire Instagram of fractals... Fractalgram? Does that exist already? If it doesn’t, someone should get on that.

The Server Solution

The last piece of the puzzle we need is to be able to serve our web files and fractal image to a web browser for rendering. Since we want to ensure portability, we should probably leverage the inherent portability of Ada and use AWS. No, it’s not the same aws that you may use for cloud storage. This AWS is the Ada Web Server which is a small and powerful HTTP server that has support for SOAP/WSDL, Server Push, HTTPS/SSL and client HTTP. If you want to know more about the other aws, check out the blog post Amazon Relies on Formal Methods for the Security of AWS. For the rest of this blog series, let’s assume AWS stands for Ada Web Server.

Because AWS is written entirely in Ada, it is incredibly portable. The current build targets include UNIX-like OS’s, Windows, macOS, and VxWorks for operating systems and ARM, x86, and PPC for target architectures. Just by changing the compiler I have been able to recompile and run this application, with no source code differences, for QNX on 64 bit ARM, macOS, Windows, and Linux on 64 bit x86, and VxWorks 6.9 on 32 bit ARM. That’s pretty portable!

For those who have access to GNATPro on Windows, Linux, or macOS, you can find AWS included with the toolsuite installation. For GNAT Community users, or GNATPro users who would like to compile AWS for a different target, you can build the library from the source located on the AdaCore GitHub.

Hooking up the pieces

Hooking up the fractal application to AWS was pretty easy. It only required creating a URI router which takes incoming GET requests and dispatches them to the correct worker function. The full router tree can be found in the router_cb.adb file.

function Router (Request : AWS.Status.Data) return AWS.Response.Data
is
URI      : constant String := AWS.Status.URI (Request);
Filename : constant String := "web/" & URI (2 .. URI'Last);
begin
--  Main server access point
if URI = "/" then
--  main page
return AWS.Response.File (AWS.MIME.Text_HTML, "web/html/index.html");

--  Requests a new image from the server
elsif URI = "/fractal" then
return AWS.Response.Build
(Content_Type  => AWS.MIME.Application_Octet_Stream,
Message_Body  => Compute_Image);

--  Serve basic files
elsif AWS.Utils.Is_Regular_File (Filename) then
return AWS.Response.File
(Content_Type => AWS.MIME.Content_Type (Filename),
Filename     => Filename);

else
Put_Line ("Could not find file: " & Filename);

return AWS.Response.Acknowledge
(AWS.Messages.S404,
end if;

end Router;

Here is a sample of a simplified router tree. The Router function is registered as a callback with the AWS framework. It is called whenever a new GET request is received by the server. If the request is for the index page (“/”), we send back the index.html page. If we receive “/fractal” we recompute the fractal image and return the pixel buffer. If the URI is a file, like a css or js file, we simply serve the file. If it’s anything else, we respond with a 404 error.

The AdaFractal app uses a slightly more complex router tree in order to handle input from the user, compute server side zoom of the fractal, and handle computing images of different sizes based on the size of the browser. On the front-end, all of these requests and responses are handled by jQuery. The image that is computed on the backend is an array of pixels using the following layout:

type Pixel is record
Red   : Color;
Green : Color;
Blue  : Color;
Alpha : Color;
end record
with Size => 32;

for Pixel use record
Red at 0 range 0 .. 7;
Green at 1 range 0 .. 7;
Blue at 2 range 0 .. 7;
Alpha at 3 range 0 .. 7;
end record;

type Pixel_Array is array
(Natural range <>) of Pixel
with Pack;

A Pixel_Array with the computed image is sent as the response to the fractal request. The javascript on the front-end overlays that onto a Canvas object which is rendered on the page. No fancy backend image library required!

AWS serves our application to whichever port we specify (the default port is 8080). We can either point our browser to localhost:8080 (127.0.0.1:8080) or if we have a headless device (such as a Raspberry PI, a QNX board, or a VxWorks solution) we can point the browser from another device to the IP address of the headless device port 8080 to view the fractal.

How about the math and performance?

An interesting side experiment came from this project after trying to increase the performance of the application. After implementing a thread pool to parallelize the math, I decided to try using fixed point types for the math. The implementation of the thread pool and the fixed point versus floating point math performance will be presented in Part 2 of this blog post.

]]>

Over the past several years, a great number of public announcements have been made about companies that are either studying or adopting the Ada and SPARK programming languages. Noteworthy examples include DolbyDensoLASP and Real Heart, as well as the French Security Agency.

Today, NVIDIA, the inventor and market leader of the Graphical Processing Unit (GPU), joins the wave by announcing the selection of SPARK and Ada for its next generation of security-critical GPU firmware running on RISC-V microcontrollers. NVIDIA’s announcement marks a new era of industrial standards for safety- and security-critical software development.

]]>
Proving Memory Operations - A SPARK Journey https://blog.adacore.com/proving-memory-operations-a-spark-journey Tue, 08 Jan 2019 13:33:00 +0000 Quentin Ochem https://blog.adacore.com/proving-memory-operations-a-spark-journey

The promise behind the SPARK language is the ability to formally demonstrate properties in your code regardless of the input values that are supplied - as long as those values satisfy specified constraints. As such, this is quite different from static analysis tools such as our CodePeer or the typical offering available for e.g. the C language, which trade completeness for efficiency in the name of pragmatism. Indeed, the problem they’re trying to solve - finding bugs in existing applications - makes it impossible to be complete. Or, if completeness is achieved, then it is at the cost of massive amount of uncertainties (“false alarms”). SPARK takes a different approach. It requires the programmer to stay within the boundaries of a (relatively large) Ada language subset and to annotate the source code with additional information - at the benefit of being able to be complete (or sound) in the verification of certain properties, and without inundating the programmer with false alarms.

With this blog post, I’m going to explore the extent to which SPARK can fulfill this promise when put in the hands of a software developer with no particular affinity to math or formal methods. To make it more interesting, I’m going to work in the context of a small low-level application to gauge how SPARK is applicable to embedded device level code, with some flavor of cyber-security.

The problem to solve and its properties

As a prelude, even prior to defining any behavior and any custom property on this code, the SPARK language itself defines a default property, the so-called absence of run-time errors. These include out of bounds access to arrays, out of range assignment to variables, divisions by zero, etc. This is one of the most advanced properties that traditional static analyzers can consider. With SPARK, we’re going to go much further than that, and actually describe functional requirements.

Let’s assume that I’m working on a piece of low level device driver whose job is to set and move the boundaries of two secure areas in the memory: a secure stack and a secure heap area. When set, these areas come with specific registers that prevent non-secure code from reading the contents of any storage within these boundaries. This process is guaranteed by the hardware, and I’m not modeling this part. However, when the boundaries are moved, the data that was in the old stack & heap but not in the new one is now accessible. Unless it is erased, it can be read by non-secure code and thus leak confidential information. Moreover, the code that changes the boundaries is to be as efficient as possible, and I need to ensure that I don’t erase memory still within the secure area.

What I described above, informally, is the full functional behavior of this small piece of code. This could be expressed as a boolean expression that looks roughly like: $dataToErase = (oldStack \cup oldHeap) \cap \overline{(newStack \cup newHeap)}$. Or in other words, the data to erase is everything that was in the previous memory ($$oldStack \cup oldHeap$$) and ($$\cap$$) not in the new memory ($$\overline{(newStack \cup newHeap}$$).

Another way to write the same property is to use a quantifier on every byte of memory, and say that on every byte, if this byte is in the old stack or the old heap but not in the new stack or the new heap, it should be erased: $\forall b \in memory, ((isOldStack(b) \lor isOldHeap (b)) \land \neg (isNewStack (b) \lor isNewHeap (b)) \iff isErased (b))$ Which means that for all the bytes in the memory ($$\forall b \in memory$$) what’s in the old regions ($$isOldStack(b) \lor isOldHeap (b)$$) but not in the new ones ($$\neg (isNewStack (b) \lor isNewHeap (b))$$) has to be erased ($$\iff isErased (b)$$).

We will also need to demonstrate that the heap and the stack are disjoint.

Ideally, we’d like to have SPARK make the link between these two ways of expressing things, as one may be easier to express than the other.

When designing the above properties, it became quite clear that I needed some intermediary library with set operations, in order to be able to express unions, intersections and complement operations. This will come with its own set of lower-level properties to prove and demonstrate.

Let’s now look at how to define the specification for this memory information.

Memory Specification and Operations

The first step is to define some way to track the properties of the memory - that is whether a specific byte is a byte of heap, of stack, and what kind of other properties they can be associated with (like, has it been erased?). The interesting point here is that the final program executable should not have to worry about these values - not only would it be expensive, it wouldn’t be practical either. We can’t easily store and track the status of every single byte. These properties should only be tracked for the purpose of statically verifying the required behavior.

There is a way in SPARK to designate code to be solely used for the purpose of assertion and formal verification, through so-called “ghost code”. This code will not be compiled to binary unless assertions are activated at run-time. But here we’ll push this one step further by writing ghost code that can’t be compiled in the first place. This restriction imposed on us will allow us to write assertions describing the entire memory, which would be impossible to compile or run.

The first step is to model an address. To be as close as possible to the actual way memory is defined, and to have access to Ada’s bitwise operations, we’re going to use a modular type. It turns out that this introduces a few difficulties: a modular type wraps around, so adding one to the last value goes back to the first one. However, in order to prove absence of run-time errors, we want to demonstrate that we never overflow. To do that, we can define a precondition on the “+” and “-” operators, with an initial attempt to define the relevant preconditions:

function "+" (Left, Right : Address_Type) return Address_Type
is (Left + Right)
with Pre => Address_Type'Last - Left >= Right;

is (Left - Right)
with Pre => Left >= Right;

The preconditions  verify that Left + Right doesn’t exceed Address_Type’Last (for “+”), and that Left - Right is above zero (for “-”). Interestingly, we could have been tempted to write the first precondition the following way:

with Pre => Left + Right <= Address_Type'Last; 

However, with wrap-around semantics inside the precondition itself, this would  always be true.

There’s still a problem in the above code, due to the fact that “+” is actually implemented with “+” itself (hence there’s is an infinite recursive call in the above code). The same goes for “-”. To avoid that, we’re going to introduce a new type “Address_Type_Base” to do the computation without contracts, “Address_Type” being the actual type in use. The full code, together with some size constants (assuming 32 bits), then becomes:

Word_Size    : constant := 32;
Memory_Size  : constant := 2 ** Word_Size;
type Address_Type_Base is mod Memory_Size; -- 0 .. 2**Word_Size - 1

with Pre => Address_Type'Last - Left >= Right;

with Pre => Left >= Right;


Armed with the above types, it’s now time to get started on the modeling of actual memory. We’re going to track the status associated with every byte. Namely, whether a given byte is part of the Stack, part of the Heap, or neither; and whether that byte has been Scrubbed (erased). The prover will reason on the entire memory. However, the status tracking will never exist in the program itself - it will just be too costly to keep all this data at run time. Therefore we’re going to declare all of these entities as Ghost (they are here only for the purpose of contracts and assertions), and we will never enable run-time assertions. The code looks like:

type Byte_Property is record
Stack    : Boolean;
Heap     : Boolean;
Scrubbed : Boolean;
end record
with Ghost;

type Memory_Type is array (Address_Type) of Byte_Property
with Ghost;

Memory : Memory_Type
with Ghost;

Here, Memory is a huge array declared as a global ghost variable. We can’t write executable code with it, but we can write contracts. In particular, we can define a contract for a function that sets the heap between two address values. As a precondition for this function, the lower value has to be below or equal to the upper one. As a postcondition, the property of the memory in the range will be set to Heap. The specification looks like this:

procedure Set_Heap (From, To : Address_Type)
with
Pre => To >= From,
Post => (for all B in Address_Type => (if B in From .. To then Memory (B).Heap else not Memory (B).Heap)),
Global => (In_Out => Memory);


Note that I’m also defining a global here which is how Memory is processed. Here it’s modified, so In_Out.

While the above specification is correct, it’s also incomplete. We’re defining what happens for the Heap property, but not the others. What we expect here is that the rest of the memory is unmodified. Another way to say this is that only the range From .. To is updated, the rest is unchanged. This can be modelled through the ‘Update attribute, and turn the postcondition into:

Post => (for all B in Address_Type =>
(if B in From .. To then Memory (B) = Memory'Old (B)'Update (Heap => True)
else Memory (B) = Memory'Old (B)'Update (Heap => False))),


Literally meaning “The new value of memory equals the old value of memory (Memory’Old) with changes (‘Update) being Heap = True from From to To, and False elsewhere“.

Forgetting to mention what doesn’t change in data is a common mistake when defining contracts. It is also a source of difficulties to prove code, so it’s a good rule of the thumb to always consider what’s unchanged when checking these postconditions. Of course, the only relevant entities are those accessed and modified by the subprogram. Any variable not accessed is by definition unchanged.

Let’s now get to the meat of this requirement. We’re going to develop a function that moves the heap and the stack boundaries, and scrubs all that’s necessary and nothing more. The procedure will set the new heap boundaries between Heap_From .. Heap_To, and stack boundaries between Stack_From and Stack_To, and is defined as such:

procedure Move_Heap_And_Stack
(Heap_From, Heap_To, Stack_From, Stack_To : Address_Type)


Now remember the expression of the requirement from the previous section:

$\forall b \in memory, ((isOldStack(b) \lor isOldHeap (b)) \land \neg (isNewStack (b) \lor isNewHeap (b)) \iff isErased (b))$

This happens to be a form that can be easily expressed as a quantifier, on the Memory array described before:

(for all B in Address_Type =>
(((Memory'Old (B).Stack or Memory'Old (B).Heap)
and then not
(Memory (B).Stack or Memory (B).Heap))
= Memory (B).Scrubbed));


The above is literally the transcription of the property above, checking all bytes B in the address range, and then stating that if the old memory is Stack or Heap and the new memory is not, then the new memory is scrubbed, otherwise not. This contract is going to be the postcondition of Move_Heap_And_Stack. It’s the state of the system after the call.

Interestingly, for this to work properly, we’re going to need also to verify the consistency of the values of Heap_From, Heap_To, Stack_From and Stack_To - namely that the From is below the To and that there’s no overlap between heap and stack. This will be the precondition of the call:

((Stack_From <= Stack_To and then Heap_From <= Heap_To) and then
Heap_From not in Stack_From .. Stack_To and then
Heap_To not in Stack_From .. Stack_To and then
Stack_From not in Heap_From .. Heap_To and then
Stack_To not in Heap_From .. Heap_To);

That’s enough for now at this stage of the demonstration. We have specified the full functional property to be demonstrated. Next step is to implement, and prove this implementation.

Low-level Set Library

In order to implement the service, it would actually be useful to have a lower level library that implements set of ranges, together with Union, Intersect and Complement functions. This library will be called to determine which regions to erase - it will also be at the basis of the proof. Because this is going to be used at run-time, we need a very efficient implementation. A set of ranges is going to be defined as an ordered disjoint list of ranges. Using a record with discriminant, it looks like:

type Area is record
end record
with Predicate => From <= To;

type Area_Array is array(Natural range <>) of Area;

type Set_Base (Max : Natural) is record
Size  : Natural;
Areas : Area_Array (1 .. Max);
end record;


Note that I call this record Set_Base, and not Set. Bear with me.

You may notice above already a first functional predicate. In the area definition, the fields From and To are described such as From is always below To. This is a check very similar to an Ada range check in terms of where it applies - but on a more complicated property. For Set, I’m also going to express the property I described before, that Areas are ordered (which can can be expressed as the fact that the To value of an element N is below the From value of the element N + 1) and disjoint (the From of element N minus the To of element N + 1 is at least 1). There’s another implicit property to be specified which is that the field Size is below or equal to the Max size of the array. Being able to name and manipulate this specific property has some use, so I’m going to name it in an expression function:

function Is_Consistent (S : Set_Base) return Boolean is
(S.Size <= S.Max and then
((for all I in 1 .. S.Size - 1 =>
S.Areas (I).To < S.Areas (I + 1).From and then
S.Areas (I + 1).From - S.Areas (I).To > 1)));

Now comes the predicate. If I were to write:

type Set_Base (Max : Natural) is record
Size  : Natural;
Areas : Area_Array (1 .. Max);
end record
with Predicate => Is_Consistent (Set_Base);


I would have a recursive predicate call. Predicates are checked in particular when passing parameters, so Is_Consistent would check the predicate of Set_Base, which is a call to Is_Consistent, which would then check the predicate and so on. To avoid this, the predicate is actually applied to a subtype:

subtype Set is Set_Base
with Predicate => Is_Consistent (Set);


As it will later turn out, this property is very fundamental to the ability for proving other properties. At this stage, it’s already nice to see some non-trivial property being expressed, namely that the structure is compact (it doesn't waste space by having consecutive areas that could be merged into one, or said otherwise, all areas are separated by at least one excluded byte).

The formal properties expressed in the next steps will be defined in the form of inclusion - if something is included somewhere then it may or may not be included somewhere else. This inclusion property is expressed as a quantifier over all the ranges. It’s not meant to be run by the program, but only for the purpose of property definition and proof. The function defining that a given byte is included in a set of memory ranges can be expressed as follows:

function Includes (B : Address_Type; S : Set) return Boolean
is (for some I in 1 .. S.Size =>
B in S.Areas (I).From .. S.Areas (I).To)
with Ghost;


Which means that for all the areas in the set S, B is included in the set if B is included in at least one (some) of the areas in the set.

I’m now going to declare a constructor “Create” together with three operators, “or”, “and”, “not” which will respectively implement union, intersection and complement. For each of those, I need to provide some expression of the maximum size of the set before and after the operation, as well as the relationship between what’s included in the input and in the output.

The specification of the function Create is straightforward. It takes a range as input, and creates a set where all elements within this range are contained in the resulting set. This reads:

function Create (From, To : Address_Type) return Set
with Pre => From <= To,
Post => Create'Result.Max = 1
and then Create'Result.Size = 1
and then (for all B in Address_Type =>
Includes (B, Create'Result) = (B in From .. To));


Note that interestingly, the internal implementation of the Set isn’t exposed by the property computing the inclusion. I’m only stating what should be included without giving details on how it should be included. Also note that as in many other places, this postconditon isn’t really something we’d like to execute (that would be possibly a long loop to run for large area creation). However, it’s a good way to model our requirement.

Let’s carry on with the “not”. A quick reasoning shows that at worst, the result of “not” is one area bigger than the input. We’ll need a precondition checking that the Size can indeed be incremented (it does not exceed the last value of the type). The postcondition is that this Size has been potentially incremented, that values that were not in the input Set are now in the resulting one and vice-versa. The operator with its postcondition and precondition reads:

function "not" (S : Set) return Set
with
Pre => Positive'Last - S.Size > 0,
Post =>
(for all B in Address_Type =>
Includes (B, "not"'Result) /= Includes (B, S))
and then "not"'Result.Size <= S.Size + 1;


The same reasoning can be applied to “and” and “or”, which leads to the following specifications:

function "or" (S1, S2 : Set) return Set
with
Pre => Positive'Last - S1.Size - S2.Size >= 0,
Post => "or"'Result.Size <= S1.Size + S2.Size
and Is_Consistent ("or"'Result)
and (for all B in Address_Type =>
(Includes (B, "or"'Result)) =
((Includes (B, S1) or Includes (B, S2))));

function "and" (S1, S2 : Set) return Set
with
Pre => Positive'Last - S1.Size - S2.Size >= 0,
Post => "and"'Result.Size <= S1.Size + S2.Size
and (for all B in Address_Type =>
(Includes (B, "and"'Result)) =
((Includes (B, S1) and Includes (B, S2))));


Of course at this point, one might be tempted to first prove the library and then the user code. And indeed I was tempted and fell for it. However, as this turned out to be a more significant endeavor, let’s start by looking at the user code.

Move_Heap_And_Stack - the “easy” part

Armed with the Set library, implementing the move function is relatively straightforward. We’re using other services to get the heap and stack boundaries, then creating the set, using the proper operators to create the list to scrub, and then scrubbing pieces of memory one by one, then finally setting the new heap and stack pointers.

procedure Move_Heap_And_Stack
(Heap_From, Heap_To, Stack_From, Stack_To : Address_Type)
is
Prev_Heap_From, Prev_Heap_To,
begin
Get_Stack_Boundaries (Prev_Stack_From, Prev_Stack_To);
Get_Heap_Boundaries (Prev_Heap_From, Prev_Heap_To);

declare
Prev : Set := Create (Prev_Heap_From, Prev_Heap_To) or
Create (Prev_Stack_From, Prev_Stack_To);
Next : Set := Create (Heap_From, Heap_To) or
Create (Stack_From, Stack_To);
To_Scrub : Set := Prev and not Next;
begin
for I in 1 .. To_Scrub.Size loop
Scrub (To_Scrub.Areas (I).From, To_Scrub.Areas (I).To);
end loop;

Set_Stack (Stack_From, Stack_To);
Set_Heap (Heap_From, Heap_To);
end;
end Move_Heap_And_Stack;


Now let’s dive into the proof. As a disclaimer, the proofs we’re going to do for now on are hard. One doesn’t need to go this far to take advantage of SPARK. As a matter of fact, defining requirements formally is already taking good advantage of the technology. Most people only prove data flow or absence of run-time errors which is already a huge win. The next level is some functional key properties. We’re going one level up and entirely proving all the functionalities. The advanced topics that are going to be introduced in this section, such as lemma and loop invariants, are mostly needed for these advanced levels.

The first step is to reset the knowledge we have on the scrubbing state of the memory. Remember that all of the memory state is used to track the status of the memory, but it does not correspond to any real code. To reset the flag, we’re going to create a special ghost procedure, whose sole purpose is to set these flags:

procedure Reset_Scrub
with
Post =>
(for all B in Address_Type =>
Memory (B) = Memory'Old(B)'Update (Scrubbed => False)),
Ghost;


In theory, it is not absolutely necessary to provide an implementation to this procedure if it’s not meant to be compiled. Knowing what it’s supposed to do is good enough here. However, it can be useful to provide a ghost implementation to describe how it would operate. The implementation is straightforward:

procedure Reset_Scrub is
begin
Memory (B).Scrubbed := False;
end loop;
end Reset_Scrub;

We’re going to hit now our first advanced proof topic. While extremely trivial, the above code doesn’t prove, and the reason why it doesn’t is because it has a loop. Loops are something difficult for provers and as of today, they need help to break them down into sequential pieces. While the developer sees a loop, SPARK sees three different pieces of code to prove, connected by a so-called loop invariant which summarizes the behavior of the loop:

procedure Reset_Scrub is
begin
Memory (B).Scrubbed := False;
[loop invariant]


Then:

         [loop invariant]
B := B + 1;
Memory (B).Scrubbed := False;
[loop invariant]

And eventually:

         [loop invariant]
end loop;
end;


The difficulty is now about finding this invariant that is true on all these sections of the code, and that ends up proving the postcondition. To establish those, it’s important to look at what needs to be proven at the end of the loop. Here it would be that the entire array has Scrubbed = False, and that the other fields still have the same value as at the entrance of the loop (expressed using the attribute ‘Loop_Entry):

(for all X in Address_Type =>
Memory (X) = Memory'Loop_Entry(X)'Update (Scrubbed => False))


Then to establish the loop invariant, the question is how much of this is true at each step of the loop. The answer here is that this is true up until the value B. The loop invariant then becomes:

(for all X in Address_Type'First .. B =>
Memory (X) = Memory'Loop_Entry(X)'Update (Scrubbed => False))


Which can be inserted in the code:

procedure Reset_Scrub is
begin
Memory (B).Scrubbed := False;
pragma Loop_Invariant
(for all X in Address_Type'First .. B =>
Memory (X) =
Memory'Loop_Entry(X)'Update (Scrubbed => False))
end loop;
end Reset_Scrub;


Back to the main code. We can now insert a call to Reset_Scrub before performing the actual scrubbing. This will not do anything in the actual executable code, but will tell the prover to consider that the ghost values are reset. Next, we have another loop scrubbing subsection:

for I in 1 .. To_Scrub.Size loop
Scrub (To_Scrub.Areas (I).From, To_Scrub.Areas (I).To);
end loop;


Same as before, the question is here what is the property true at the end of the loop, and breaking this property into what’s true at each iteration. The property true at the end is that everything included in the To_Scrub set has been indeed scrubbed:

(for all B in Address_Type =>
Includes (B, To_Scrub) = Memory (B).Scrubbed);

This gives us the following loop with its invariant:

for I in 1 .. To_Scrub.Size loop
Scrub (To_Scrub.Areas (I).From, To_Scrub.Areas (I).To);

-- everything until the current area is properly scrubbed
pragma Loop_Invariant
(for all B in Address_Type'First .. To_Scrub.Areas (I).To =>
Includes (B, To_Scrub) = Memory (B).Scrubbed);
end loop;

So far, this should be enough. Establishing these loop invariants may look a little bit intimidating at first, but with a little bit of practise they become rapidly straightforward.

Unfortunately, this is not enough.

Implementing Unproven Interaction with Registers

As this is a low level code, some data will not be proven in the SPARK sense. Take for example the calls that read the memory boundaries:

procedure Get_Heap_Boundaries (From, To : out Address_Type)
with Post => (for all B in Address_Type => (Memory (B).Heap = (B in From .. To)))
and then From <= To;

procedure Get_Stack_Boundaries (From, To : out Address_Type)
with Post => (for all B in Address_Type => (Memory (B).Stack = (B in From .. To)))
and then From <= To;


The values From and To are possibly coming from registers. SPARK wouldn’t necessary be able to make the link with the post condition just because the data is outside of its analysis. In this case, it’s perfectly fine to just tell SPARK about a fact without having to prove it. There are two ways to do this, one is to deactivate SPARK from the entire subprogram:

 procedure Get_Heap_Boundaries
with SPARK_Mode => Off
is
begin
-- code
end Get_Heap_Boundaries;

In this case, SPARK will just assume that the postcondition is correct. The issue is that there’s no SPARK analysis on the entire subprogram, which may be too much. An alternative solution is just to state the fact that the postcondition is true at the end of the subprogram:

procedure Get_Heap_Boundaries
is
begin
-- code

pragma Assume (for all B in Address_Type => (Memory (B).Heap = (B in From .. To)));
end Get_Heap_Boundaries;

In this example, to illustrate the above, the registers will be modeled as global variables read and written from C - which is outside of SPARK analysis as registers would be.

Move_Heap_And_Stack - the “hard” part

Before diving into what’s next and steering away readers from ever doing proof, let’s step back a little. We’re currently set to doing the hardest level of proof - platinum. That is, fully proving a program’s functional behavior. There is a lot of benefit to take from SPARK prior to reaching this stage. The subset of the language alone provides more analyzable code. Flow analysis allows you to easily spot uninitialized data. Run-time errors such as buffer overflow are relatively easy to clear out, and even simple gold property demonstration is reachable by most software engineers after a little bit of training.

Full functional proof - that is, complex property demonstration - is hard. It is also not usually required. But if this is what you want to do, there’s a fundamental shift of mindset required. As it turns out, it took me a good week to understand that. For a week as I was trying to build a proof from bottom to top, adding various assertions left and right, trying to make things fits the SPARK expectations. To absolutely no results.

And then, in the midst of despair, the apple fell on my head.

The prover is less clever than you think. It’s like a kid learning maths. It’s not going to be able to build a complex demonstration to prove the code by itself. Asking it to prove the code is not the right approach. The right approach is to build the demonstration of the code correctness, step by step, and to prove that this demonstration is correct. This demonstration is a program in itself, with its own subprograms, its own data and control flow, its own engineering and architecture. What the prover is going to do is to prove that the demonstration of the code is correct, and as the demonstration is linked to the code, the code happens to be indirectly proven correct as well.

Now reversing the logic, how could I prove this small loop? One way to work that out is to describe scrubbed and unscrubbed areas one by one:

• On the first area to scrub, if it doesn’t start at the beginning of the memory, everything before the start has not been scrubbed.

• When working on any area I beyond the first one, everything between the previous area I - 1 and the current one has not been scrubbed

• At the end of an iteration, everything beyond the current area I is unscrubbed.

• At the end of the last iteration, everything beyond the last area is unscrubbed

The first step to help the prover is to translate all of these into assertions, and see if these steps are small enough for the demonstration to be proven - and for the demonstration to indeed prove the code. At this stage, it’s not a bad idea to describe the assertion in the loop as loop invariants as we want the information to be carried from one loop iteration to the next. This leads to the following code:

for I in 1 .. To_Scrub.Size loop
Scrub (To_Scrub.Areas (I).From, To_Scrub.Areas (I).To);

pragma Loop_Invariant
(if I > 1 then (for all B in Address_Type range To_Scrub.Areas (I - 1).To + 1 .. To_Scrub.Areas (I).From - 1 =>
not Memory (B).Scrubbed));

pragma Loop_Invariant
then (for all B in Address_Type'First .. To_Scrub.Areas (1).From - 1 => not Memory (B).Scrubbed));

pragma Loop_Invariant
(if To_Scrub.Areas (I).To < Address_Type'Last then
(for all B in To_Scrub.Areas (I).To + 1 .. Address_Type'Last => not Memory (B).Scrubbed));

pragma Loop_Invariant
(for all B in Address_Type'First .. To_Scrub.Areas (I).To => Includes (B, To_Scrub) = Memory (B).Scrubbed);
end loop;

pragma Assert
(if To_Scrub.Size >= 1 and then To_Scrub.Areas (To_Scrub.Size).To < Address_Type'Last then
(for all B in To_Scrub.Areas (To_Scrub.Size).To + 1 .. Address_Type'Last => not Memory (B).Scrubbed));


Results are not too bad at first sight. Out of the 5 assertions, only 2 don’t prove. This may mean that they’re wrong - this may also mean that SPARK needs some more help to prove.

Let’s look at the first one in more details:

pragma Loop_Invariant
then (for all B in Address_Type'First .. To_Scrub.Areas (1).From - 1 => not Memory (B).Scrubbed));


Now if we were putting ourselves in the shoes of SPARK. The prover doesn’t believe that there’s nothing scrubbed before the first element. Why would that be the case? None of these bytes are in the To_Scrub area, right? Let’s check. To investigate this, the technique is to add assertions to verify intermediate steps, pretty much like what you’d do with the debugger. Let’s add an assertion before:

pragma Assert
then (for all B in Address_Type'First .. To_Scrub.Areas (1).From - 1 => not Includes (B, To_Scrub)));


That assertion doesn’t prove. But what would this be true? Recall that we have a consistency check for all sets which is supposed to be true at this point, defined as:

function Is_Consistent (S : Set_Base) return Boolean is
(S.Size <= S.Max and then
((for all I in 1 .. S.Size - 1 =>
S.Areas (I).To < S.Areas (I + 1).From and then
S.Areas (I + 1).From - S.Areas (I).To > 1)));


So looking at the above, if all areas are after the first one, there should be nothing before the first one. If Is_Consistent is true for To_Scrub, then the assertion ought to be true. Yet SPARK doesn’t believe us.

When reaching this kind of situation, it’s good practise to factor out the proof. The idea is to create a place where we say “given only these hypotheses, can you prove this conclusion?”. Sometimes, SPARK is getting lost in the wealth of information available, and just reducing the number of hypothesis to consider to a small number is enough for get it to figure out something.

Interestingly, this activity of factoring out a piece of proof is very close to what you’d do for a regular program. It’s also easier for the developer to understand small pieces of code than a large flat program. The prover is no better than that.

These factored out proofs are typically referred to as lemmas. They are Ghost procedures that prove a postcondition from a minimal precondition. For convention, we’ll call them all Lemma something. The Lemma will look like:

procedure Lemma_Nothing_Before_First (S : Set) with
Ghost,
Pre => Is_Consistent (S),
Post =>
(if S.Size = 0 then (for all B in Address_Type => not Includes (B, S))
elsif S.Areas (1).From > Address_Type'First then
(for all B in Address_Type'First .. S.Areas (1).From - 1 => not Includes (B, S)));


Stating that if S is consistent, then either it’s null (nothing is included) or all elements before the first From are not included.

Now let’s see if reducing the scope of the proof is enough. Let’s just add an empty procedure:

procedure Lemma_Nothing_Before_First (S : Set) is
begin
null;
end Lemma_Nothing_Before_First;


Still no good. That was a good try though. Assuming we believe to be correct here (and we are), the game is now to demonstrate to SPARK how to go from the hypotheses to the conclusion.

To do so we need to take into account one limitation of SPARK - it doesn’t do induction. This has a significant impact on what can be deduced from one part of the hypothesis:

(for all I in 1 .. S.Size - 1 =>
S.Areas (I).To < S.Areas (I + 1).From and then
S.Areas (I + 1).From - S.Areas (I).To > 1)


If all elements “I” are below the “I + 1” element, then I would like to be able to check that all “I” are below all the “I + N” elements after it. This ability to jump from proving a one by one property to a whole set is called induction. This happens to be extremely hard to do for state-of-the-art provers. Here lies our key. We’re going to introduce a new lemma that goes from the same premise, and then demonstrates that it means that all the areas after a given one are greater:

procedure Lemma_Order (S : Set) with
Ghost,
Pre => (for all I in 1 .. S.Size - 1 => S.Areas (I).To < S.Areas (I + 1).From),
Post =>
(for all I in 1 .. S.Size - 1 => (for all J in I + 1 .. S.Size => S.Areas (I).To < S.Areas (J).From));


And we’re going to write the demonstration as a program:

procedure Lemma_Order (S : Set)
is
begin
if S.Size = 0 then
return;
end if;

for I in 1 .. S.Size - 1 loop
for J in I + 1 .. S.Size loop
pragma Assert (S.Areas (J - 1).To < S.Areas (J).From);
pragma Loop_Invariant (for all R in I + 1 .. J => S.Areas (I).To < S.Areas (R).From);
end loop;

pragma Loop_Invariant
((for all R in 1 .. I => (for all T in R + 1 .. S.Size => S.Areas (R).To < S.Areas (T).From)));
end loop;
end Lemma_Order;


As you can see here, for each area I, we’re checking that the area I + [1 .. Size] are indeed greater. This happens to prove trivially with SPARK. We can now prove Lemma_Nothing_Before_First by applying the lemma Lemma_Order. To apply a lemma, we just need to call it as a regular function call. Its hypotheses (precondition) will be checked by SPARK, and its conclusion (postcondition) added to the list of hypotheses available to prove:

procedure Lemma_Nothing_Before_First (S : Set) is
begin
Lemma_Order (S);
end Lemma_Nothing_Before_First;


This now proves trivially. Back to the main loop, applying the lemma Lemma_Nothing_Before_First looks like:

for I in 1 .. To_Scrub.Size loop
Scrub (To_Scrub.Areas (I).From, To_Scrub.Areas (I).To);
Lemma_Nothing_Before_First (To_Scrub);

pragma Loop_Invariant
(if I > 1 then
(for all B in Address_Type range To_Scrub.Areas (I - 1).To + 1 .. To_Scrub.Areas (I).From - 1 => not Memory (B).Scrubbed));

pragma Loop_Invariant
then (for all B in Address_Type'First .. To_Scrub.Areas (1).From - 1 => not Memory (B).Scrubbed));

pragma Loop_Invariant
(if To_Scrub.Areas (I).To < Address_Type'Last then
(for all B in To_Scrub.Areas (I).To + 1 .. Address_Type'Last => not Memory (B).Scrubbed));

pragma Loop_Invariant
(for all B in Address_Type'First .. To_Scrub.Areas (I).To => Includes (B, To_Scrub) = Memory (B).Scrubbed);
end loop;

pragma Assert
(if To_Scrub.Size >= 1 and then To_Scrub.Areas (To_Scrub.Size).To < Address_Type'Last then
(for all B in To_Scrub.Areas (To_Scrub.Size).To + 1 .. Address_Type'Last => not Memory (B).Scrubbed));


And voila! One more loop invariant now proving properly.

At this point, it’s probably not worth diving into all the details of this small subprogram - the code is available here. There’s just more of the same.

The size of this small function is relatively reasonable. Now let’s give some insights on a much more difficult problem: the Set library.

A generous implementation brings about 250 lines of code (it could actually be less if condensed, but let’s start with this). That’s a little bit less than a day of work for implementation and basic testing.

For the so called silver level - that is absence of run-time errors, add maybe around 50 lines of assertions and half a day of work. Not too bad.

For gold level, I decided to prove one key property. Is_Consistent, to be true after each operator. Maybe a day of work was needed for that one. Add another 150 lines of assertions maybe. Still reasonable.

Platinum is about completely proving the functionality of my subprogram. And that proved (pun intended) to be a much much more difficult experience. See this link and this link for other similar experiences. As a disclaimer, I am an experienced Ada developer but had relatively little experience with proof. I also selected a problem relatively hard - the quantified properties and the Set structure are quite different, and proving quantifiers is known to be hard for provers to start with.With that in mind, the solution I came up with spreads over almost a thousand lines of code - and consumed about a week and a half of effort.

I’m also linking here a solution that my colleague Claire Dross came up with. She’s one of our most senior expert in formal proof, and within a day could prove the two most complex operators in about 300 lines of code (her implementation is also more compact than mine).

The above raises a question - is it really worth it? Silver absolutely does - it is difficult to bring a case against spending a little bit more effort in exchange for the absolute certainty of never having a buffer overflow or a range check error. There’s no doubt that in the time I spent in proving this, I would have spent much more in debugging either testing, or worse, errors in the later stages should this library be integrated in an actual product. Gold is also a relatively strong case. The fact that I only select key properties allows only concentrating on relatively easy stuff, and the confidence of the fact that they are enforced and will never have to be tested clearly outweighs the effort.

I also want to point out that the platinum effort is well worth it on the user code in this example. While it looks tedious at first sight, getting these properties right is relatively straightforward, and allows gaining confidence on something that can’t be easily tested; that is, a property on the whole memory.

Now the question remains - is it worth the effort on the Set library, to go from maybe two days of code + proof to around a week and a half?

I can discuss it either way but having to write 700 lines of code to demonstrate to the prover that what I wrote is correct keeps haunting me. Did I really have these 700 lines of reasoning in my head when I developed the code? Did I have confidence in the fact that each of those was logically linked to the next? To be fair, I did find errors in the code when writing those, but the code wasn’t fully tested when I started the proof. Would the test have found all the corner cases? How much time would such a corner case take to debug if found in a year? (see this blog post for some insights on hard to find bugs removed by proof).

Some people who safely certify software against e.g. avionics & railway standards end up writing 10 times more lines of tests than code - all the while just verifying samples of potential data. In that situation, provided that the properties under test can be modelled by SPARK assertions and that they fit what the prover knows how to do, going through this level of effort is a very strong case.

Anything less is open for debate. I have to admit, against all odds, it was a lot of fun and I would personally be looking forward to taking the challenge again. Would my boss allow me to is a different question. It all boils down to the cost of a failure versus the effort to prevent said failure. Being able to make an enlightened decision might be the most valuable outcome of having gone through the effort.

]]>
​Amazon Relies on Formal Methods for the Security of AWS https://blog.adacore.com/amazon-relies-on-formal-methods-for-the-security-of-aws Tue, 23 Oct 2018 13:17:09 +0000 Yannick Moy https://blog.adacore.com/amazon-relies-on-formal-methods-for-the-security-of-aws

Byron Cook, who founded and leads the Automated Reasoning Group at Amazon Web Services (AWS) Security, gave a powerful talk at the Federated Logic Conference in July about how Amazon uses formal methods for ensuring the security of parts of AWS infrastructure. In the past four years, this group of 20+ has progressively hired well-known formal methods experts to face the growing demand inside AWS to develop tools based on formal verification for reasoning about cloud security.

Cook summarizes very succinctly the challenge his team is addressing at 17:25 in the recording: "How does AWS continues to scale quickly and securely?"

A message that Cook hammers out numerous times in his talk is that "soundness is key". See at 25:05 as he explains that some customers value so much the security guarantees that AWS can offer with formal verification that it justified their move to AWS.

Even closer to what we do with SPARK, he talks at 26:42 about source code verification, and has this amaz(on)ing quote: "Proof is an accelerator for adoption. People are moving orders of magnitude workload more because they're like 'in my own data center I don't have proof' but there they have proofs."

In the companion article that was published at the conference, Cook gives more details about what the team has achieved and where they are heading now:

In  2017  alone  the  security  team  used  deductive  theorem provers or model checking tools to reason about cryptographic protocols/systems, hypervisors, boot-loaders/BIOS/firmware, garbage collectors, and network designs.
In many cases we use formal verification tools continuously to ensure that security is implemented as designed. In this scenario, whenever changes and  updates  to  the  service/feature  are  developed,  the  verification  tool is  reexecuted automatically prior to the deployment of the new version.
The customer reaction to features based on formal reasoning tools has been overwhelmingly  positive,  both  anecdotally  as  well  as  quantitatively.  Calls by AWS services to the automated reasoning tools increased by four orders of magnitude in 2017. With the formal verification tools providing the semantic foundation, customers can make stronger universal statements about their policies and networks and be confident that their assumptions are not violated.

While AWS certainly has unique security challenges that justify a strong investment in security, it's not unique in depending on complex software for its operations. What is unique so far is the level of investment at AWS in formal verification as a means to radically eliminate some security problems, both for them and for their customers.

This is certainly an approach we're eager to support with our own investment in the SPARK technology.

]]>

The challenge

Are you ready to develop a project to the highest levels of safety, security and reliability? If so, Make with Ada is the challenge for you! We’re calling on embedded developers across the globe to build cool embedded applications using the Ada and SPARK programming languages and are offering over $8000 in total prizes. In addition, eligible students will compete for a reward of an Analog Discovery 2 Pro Bundle worth$299.99!

For this competition, we chose not to set a project theme because we want you to be able to demonstrate your inventiveness and to work on a project that motivates you. We’re inviting you to contribute to the world of safe, secure and reliable software by using Ada/SPARK to build something that matters. Learn how to program in Ada and SPARK here – learn.adacore.com.

It's time to Make with Ada. Find out more on Hackster.io.

]]>

This course is geared to software professionals looking for a practical introduction to the Ada language with a focus on embedded systems, including real-time features as well as critical features introduced in Ada 2012. By attending this course you will understand and know how to use Ada for both sequential and concurrent applications, through a combination of live lectures from AdaCore's expert instructors and hands-on workshops using AdaCore's latest GNAT technology. AdaCore will provide an Ada 2012 tool-chain and ARM-based target boards for embedded workshops. No previous experience with Ada is required.

The course will be conducted in English.

Prerequisite: Knowledge of a programming language (Ada 83, C, C++, Java…).

Each participant should come with a computer having at least one USB port and running Windows.

For the full agenda and to register, visit: https://www.adacore.com/public...

Attachments

]]>

I was looking for a topic for my master thesis in embedded systems engineering when one of my advisor proposed the idea of programming a control system for autonomous trains in Ada. Since I am fascinated by the idea of autonomous vehicles I agreed immediately without knowing Ada.

The obligatory "hello world" was written within a few minutes, so nothing could stop me from writing an application for autonomous trains. I was such a fool. The first days of coding were really hard and it felt like writing a Shakespeare sonnet in a language I had never heard but after a few days and a lot of reading in "Ada 2012" by John Barns I recognized the beauty and efficiency of this language.

Half a year had to go by before the first lines of code for my project could be written. First, I had to define requirements and choose which hardware to use. As a start, my advisor suggested Raspberry Pi 3 as implementation platform. Then I had to dive into the world of model trains to develop my first model railway layout.

A lot of soldering and wiring was required to build such a simple track (see image below). What a perfect job for a software developer, but as a prospective engineer nothing seemed impossible. It took two months until the whole hardware part was set up. Thanks to my colleagues, electronic engineers, we managed to set up everything as required and stabilized the signals until the interference from the turnout drives was eliminated.

The system architecture  (see below) is a bit complicated since a lot of different components had to be taken into account.

To communicate with the trains, turnouts and train position detectors I used an Electronic Solutions Ulm Command Station 50210 (ECoS). It uses the DCC Railcom protocol to communicate with the attached hardware. There are two possibilities to interact with the ECoS -- either an IP interface to send commands  or the touchscreen.

The Messaging Raspberry Pi (MRP) receives all messages of the system and continuously polls  its interfaces to detect changes for example if a button was pressed to call a train to a specific station.  A colleague wrote the software for this part in C#. All messages are converted into a simple format, are transferred to the Control Raspberry Pi and are finally interfaced to the Ada train control software.

A third microcontroller (an STM32F1) triggers the attached LEDs  to display if a part of the track can be passed through. The  respective commands are sent via UART to the STM32F1. The  STM32 software is written in C, but will be rewritten in Ada in the next winter semester by my students during their Ada lessons. If you  allow me a side note: I was overwhelmed by the simplicity of Adas multitasking mechanisms, so that I decided to change the content of my safety programming lectures from C and the MISRA C standard to Ada and SPARK, but that is another story.

A goal of my master thesis was to program an on demand autonomous train system. Therefore, trains have to drive autonomously to the different stations,  turnouts must be changed,  signals must be set and trains must be detected along the track. Each train has its own task storing and processing information: it has to identify its position on the track or the next station to stop at  and to offer information like which message it expects next. One main task receives all the messages from the MRP, analyzes them and passes them to the different train tasks. The main part of the software is to handle and analyze all the  messages, since each component like the signals or turnouts have their own protocols to adhere to.  Thankfully, rendezvous in Ada are very easy to implement so it was fun to write this part of code.

The following video shows a train which is called to a station and drives to a terminus station after  stopping at the station and picking up passengers. As the train starts from the middle station, another train has to drive to the middle station. The middle station has to be  occupied at all times because waiting times must be kept as short as possible. The signals are not working in this video because I started a new project with a bigger track and already detached most of the parts.

]]>

When I bought the TinyFPGA-BX board, I thought it would be an opportunity to play a little bit with FPGA, learn some Verilog or VHDL. But when I discovered that it was possible to have a RISC-V CPU on it, I knew I had to run Ada code on it.

The RISC-V CPU in question is the PicoRV32 from Clifford Wolf. It is written in Verilog and implements the RISC-V 32bits instruction set (IMC extensions). In this blog post I will explain how I added support for this CPU and made an example project.

Compiler and run-time

More than a year ago I wrote a blog post about building an experimental Ada compiler and running code the first RISC-V micro-controller, the HiFive1. Since then, we released an official support of RISC-V in the Pro and Community editions of GNAT so you don’t even have to build the compiler anymore.

For the run-time, we will start from the existing HiFive1 run-time and change a few things to match the specs of the PicoRV32. As you will see it’s very easy.

Compared to the HiFive1, the PicoRV32 run-time will have

• A different memory map (RAM and flash)

• A different text IO driver (UART)

• Different instruction set extensions

Memory map

This step is very simple, we just use linker script syntax to declare the two memory areas of the TinyFPGA-BX chip (ICE40):

MEMORY
{
flash (rxai!w) : ORIGIN = 0x00050000, LENGTH = 0x100000
ram (wxa!ri)   : ORIGIN = 0x00000000, LENGTH = 0x002000
}


Text IO driver

Again this is quite simple, the UART peripheral provided with PicoRV32 only has two registers. We first declare them at their respective addresses:

   UART_CLKDIV : Unsigned_32

UART_Data : Unsigned_32


Then the code just initializes the peripheral by setting the clock divider register to get a 115200 baud rate, and sends characters to the data register:

   procedure Initialize is
begin
UART_CLKDIV := 16_000_000 / 115200;
Initialized := True;
end Initialize;

procedure Put (C : Character) is
begin
UART_Data := Unsigned_32 (Character'Pos (C));
end Put;

Run-time build script

The last modification is to the Python scripts that create the run-times. To create a new run-time, we add a class that defines different properties like compiler switches or the list of files to include:

class PicoRV32(RiscV32):
@property
def name(self):
return 'picorv32'

@property
def compiler_switches(self):
# The required compiler switches
return ['-march=rv32imc', '-mabi=ilp32']

@property
def has_small_memory(self):
return True

@property
return ['ROM']

def __init__(self):
super(PicoRV32, self).__init__()

# Use the same base linker script as the HiFive1

# Use the same startup code as the HiFive1



Building the run-time

Once all the changes are made, it is time to build the run-time.

Then run the script inside the bb-runtime repository to create the run-time:

$git clone https://github.com/AdaCore/bb-runtimes$ cd bb-runtimes
$./build_rts.py --bsps-only --output=temp picorv32 Compile the generated run-time: $ gprbuild -P temp/BSPs/zfp_picorv32.gpr


And install:

$gprinstall -p -f -P temp/BSPs/zfp_picorv32.gpr This is it for the run-time. For a complete view of the changes, you can have a look at the commit on GitHub: here. Example project Now that we can compile Ada code for the PicoRV32, let’s work on an example project. I wanted to include a custom Verilog module, otherwise what’s the point of using an FPGA, right? So I made a peripheral that controls WS2812 RGB LEDs, also known as Neopixels. I won’t explain the details of this module, I would just say that digital logic is difficult for the software engineer that I am :) The hardware/software interface is a memory mapped register that once written to, sends a WS2812 data frame. To control a strip of multiple LEDs, the software just has to write to this register multiple times in a loop. The example software is relatively simple, the Neopixel driver package is generic so that it can handle different lengths of LED strips. The address of the memory mapped register is also a parameter of the generic, so it is possible to have multiple peripherals controlling different LED strips. The memory mapped register is defined using Ada’s representation clauses and Address attribute:  type Pixel is record B, R, G : Unsigned_8; end record with Size => 32, Volatile_Full_Access; for Pixel use record B at 0 range 0 .. 7; R at 0 range 8 .. 15; G at 0 range 16 .. 23; end record; Data_Register : Pixel with Address => Peripheral_Base_Addr;  Then an HSV to RGB conversion function is used to implement different animations on the LED strip, like candle simulation or rainbow effect. And finally there are button inputs to select the animation and the intensity of the light. Both hardware and software sources can be found in this repository. I recommend to follow the TinyFPGA-BX User Guide first to get familiar with the board and how the bootloader works. Feeling inspired and want to start Making with Ada? We have the perfect challenge for you! The Make with Ada competition, hosted by AdaCore, calls on embedded developers across the globe to build cool embedded applications using the Ada and SPARK programming languages and offers over €8000 in total prizes. ]]> AdaCore major sponsor at HIS 2018 https://blog.adacore.com/adacore-major-sponsor-at-his-2018 Wed, 08 Aug 2018 09:51:00 +0000 Pamela Trevino https://blog.adacore.com/adacore-major-sponsor-at-his-2018 We are happy to announce that, AdaCore, alongside Altran and Jaguar Land Rover will be major sponsors of the fifth edition of the renowned High Integrity Software Conference on the 6th November in Bristol! The core themes of the conference this year will be Assured Autonomous Systems, Hardware and Software Architectures, Cyber Security – People & Practice, Languages & Applications. We will address the relationship between software vulnerability, security and safety, and the far-reaching impacts it can have. This year's programme offers a wide and diverse selection of talks, including our keynotes presented by Simon Burton from Bosch and Dr. Andrew Blyth from the University of Wales. Speakers from ANSSI, Assuring Autonomy International Programme - University of York, BAE Systems, CyBOK, Imperial College, IRM, Meridian Mobile, NCSC, RealHeart, Rolls-Royce, University of Edinburgh will be giving technical sessions on building trustworthy software for embedded, connected and infrastructure systems across a range of application domains. For the full agenda and to register: https://www.his-2018.co.uk/ ]]> Learn.adacore.com is here https://blog.adacore.com/learn-adacore-com-is-here Wed, 25 Jul 2018 14:47:00 +0000 Fabien Chouteau https://blog.adacore.com/learn-adacore-com-is-here We are very proud to announce the availability of our new Ada and SPARK learning platform: learn.adacore.com. Following on from the AdaCoreU(niversity) e-learning platform, this website is the next step in AdaCore’s endeavour to provide a better online learning experience for the Ada and SPARK programming languages. This new website is designed for individuals who want to get up and running with Ada/SPARK, and also for teams or teachers looking for training or tutorial material based on Ada/SPARK. In designing the site, we decided to evolve from the video-based approach used for AdaCoreU and instead have created text-based, interactive content to ease the learning experience. In light of this, AdaCoreU will be decommissioned in the coming weeks but the course videos are already available on YouTube and the base material, including slides, remains available on GitHub for people who want to use it for their courses or trainings. The main benefit of a textual approach is the greater flexibility in how you advance through the course. Now you can easily pick and choose from the course material with the opportunity to move on to more advanced sections but then refer back to previous content when needed. We also provide greater interactivity through code snippets embedded in a widget that allows you to compile, run and even prove your code (in the case of SPARK) directly from your Web browser. This allows you to experiment with the tools without having to install them, and tweak the examples to gain a better understanding for what’s allowed and feasible. As for course content production, this new format also allows greater interactivity with the community. The source code for the courses are in reST (reStructuredText) format, which makes it easy to edit and collaborate. It’s also hosted on GitHub so that everyone can suggest fixes or ideas for new content. For the release of learn.adacore.com, you will find two courses, an introduction to Ada and an introduction to SPARK, as well as an ebook “Ada for the C++ or Java Developer”. In the future we plan to add advanced Ada and SPARK courses and also a dedicated course on embedded programming, so watch this space! Learn.adacore.com can also help you get started early and gain a competitive advantage ahead of this year’s Make with Ada competition. ]]> GNAT Community 2018 is here! https://blog.adacore.com/gnat-community-2018 Tue, 26 Jun 2018 13:01:45 +0000 Emma Adby https://blog.adacore.com/gnat-community-2018 Calling all members of the Ada and SPARK community, we are pleased to announce that GNAT Community 2018 is here! adacore.com/download What’s new? BBC micro:bit first class support We decided to adopt the micro:bit as our reference platform for teaching embedded Ada and SPARK. We chose the micro:bit for its worldwide availability, great value for a low price and the included hardware debugger. You can get one at: We did our best to provide a smooth experience when programing the micro:bit in Ada or SPARK. Here is a quick guide on how to get started: • Download and install GNAT arm-elf hosted on your platform: Windows, Linux or MacOS. This package contains the ARM cross compiler as well the required Ada run-times • Download and install GNAT native for your platform: Windows, Linux or MacOS. This package contains the GNAT Programming Studio IDE and an example to run on the micro:bit • Start GNAT Programming Studio • Click on “Create a new Project” • Select the “Scrolling Text” project under “BBC micro:bit” and click Next • Enter the directory where you want the project to be deployed and click Apply • Plug your micro:bit board with a USB cable, and wait for the system to recognize it. This can take a few seconds • Back in GNAT Programming Studio, click on the “flash to board” icon • That’s it! The example provided only uses the LED matrix, for support of more advanced micro:bit features, please have a look at the Ada Drivers Library project. RISC-V support The RISC-V open instruction set and its ecosystem are getting more interesting every week, with big names of the tech industry investing in the platform as well as cheap prototyping board being available for makers and hobbyists. After a first experiment last year, AdaCore decided to develop the support of RISC-V in GNAT. In this GNAT Community release we provide support for bare-metal RISC-V 32bit hosted on Linux. In particular, for the HiFive1 board from SiFive. You will also find drivers for this board in the Ada Drivers Library project. Find SPARK included in the package by default For the first time in the community release, SPARK is now packaged with the native compiler, making it very easy for everyone to try it out. The three standard provers Alt-Ergo, CVC4 and Z3 are included. Windows 64bit is finally here By popular request, we decided to change GNAT Community Windows from 32bit to 64bit. Arm-elf hosted on Mac In this release we also add the support for ARM bare-metal hosted on MacOS (previously only Windows and Linux). This includes the support for the BBC micro:bit mentioned above, as well as our usual STM32F4 and STM32F7 boards. Feeling inspired and want to start Making with Ada today? Use the brand new GNAT Community 2018 to get a head start in this years Make with Ada competition! makewithada.org ]]> Security Agency Uses SPARK for Secure USB Key https://blog.adacore.com/security-agency-uses-spark-for-secure-usb-key Mon, 25 Jun 2018 12:51:00 +0000 Yannick Moy https://blog.adacore.com/security-agency-uses-spark-for-secure-usb-key ANSSI, the French national security agency, has published the results of their work since 2014 on designing and implementing an open-hardware & open-source USB key that provides defense-in-depth against vulnerabilities on the USB hardware, architecture, protocol and software stack. In this project called WooKey, Ada and SPARK are key components for the security of the platform. The conference paper (in English), the presentation slides (in French) and a video recording of their presentation (in French) are all available online on the website of the French security conference SSTIC 2018. The complete hardware designs and software code will be available in Q3 2018 in the GitHub project (currently empty). Following the BadUSB vulnerability discloser in 2014 (a USB key can be used to impersonate other devices, and permanently infect all computers it connects to, as well as devices connected to these computers), the only solution to defend against such attacks (to this day) has been to disable USB connections on computers. There are a number of commercial providers of secure USB keys, but their hardware/software stacks are proprietary, so it's not possible to evaluate their level of security. Shortly after the BadUSB disclosure, ANSSI set up an internal project to devise a secure USB key that would restore trust, by being fully open source, based on state-of-the-art practice, yet be affordable for anyone to build/use. The results, four years later, were presented at conference SSTIC 2018 on June 13th. What is interesting is the key role played by the use of safe languages (Ada and SPARK) as well as formal verification (SPARK) to secure the most important services of the EwoK micro-kernel on the USB key, and the combination of these with measures to design a secure software architecture and a secure hardware. They were also quite innovative in their adoption of Ada/SPARK, automatically and progressively replacing units in C with their counterpart in Ada/SPARK in their build system. Something worth noting is that the team discovered Ada/SPARK as part of this project, and managed to prove absence of runtime errors (no buffer overflows!) in their code easily. Arnauld Michelizza from ANSSI will present their work on the EwoK micro-kernel and the software development process they adopted as part of High Integrity Software conference in Bristol on November 6. ]]> How Ada and SPARK Can Increase the Security of Your Software https://blog.adacore.com/how-ada-and-spark-can-increase-the-security-of-your-software Tue, 29 May 2018 12:15:14 +0000 Yannick Moy https://blog.adacore.com/how-ada-and-spark-can-increase-the-security-of-your-software There is a long-standing debate about which phase in the Software Development Life Cycle causes the most bugs: the specification phase or the coding phase? Along with the information on the cost to fix these bugs, answering this question would allow better allocation of QA (Quality Assurance) resources. Furthermore, the cost of bug fixes remains the subject of much debate. A recent study by NIST shows that, in the software industry at large, coding bugs are causing the majority of security issues. They analyzed the provenance of security bugs across all publicly disclosed vulnerabilities in the National Vulnerability Database from 2008 to 2016. They discovered that coding bugs account for two thirds of the total. As they say: The high proportion of implementation errors suggests that little progress has been made in reducing these vulnerabilities that result from simple mistakes, but also that more extensive use of static analysis tools, code reviews, and testing could lead to significant improvement. Our view at AdaCore is that the above list of remedies lacks a critical component for "reducing these vulnerabilities that result from simple mistakes" and probably the most important one: pick a safer programming language! This might not be appropriate for all your software, but why not re-architect your system to isolate the most critical parts and progressively rewrite them with a safer programming language? Better still - design your next system this way in the first place. What safer language to choose? One candidate is Ada, or its SPARK subset. How can they help? We've collected the answers to that question in a booklet to help people and teams who want to use Ada or SPARK for increasing the security of their software. It is freely available here. ]]> Taking on a Challenge in SPARK https://blog.adacore.com/taking-on-a-challenge-in-spark Tue, 08 May 2018 14:01:00 +0000 Johannes Kanig https://blog.adacore.com/taking-on-a-challenge-in-spark Last week, the programmer Hillel posted a challenge (the link points to a partial postmortem of the provided solutions) on Twitter for someone to prove a correct implementation of three small programming problems: Leftpad, Unique, and Fulcrum. This was a good opportunity to compare the SPARK language and its expressiveness and proof power to other systems and paradigms, so I took on the challenge. The good news is that I was able to prove all three solutions and that the SPARK proofs of each complete in no more than 10 seconds. I also believe the Fulcrum solution in particular shows some aspects of SPARK that are especially nice. I will now explain my solutions to each problem, briefly for Leftpad and Unique and in detail for Fulcrum. At the end, I discuss my takeaways from this challenge. Leftpad Hillel mentioned that the inclusion of Leftpad into the challenge was kind of a joke. A retracted JavaScript package that implemented Leftpad famously broke thousands of projects back in 2016. Part of the irony was that Leftpad is so simple one shouldn’t depend on a package for this functionality. The specification of Leftpad, according to Hillel, is as follows: Takes a padding character, a string, and a total length, returns the string padded to that length with that character. If length is less than the length of the string, does nothing. It is always helpful to start with translating the specification to a SPARK contract. To distinguish between the two cases (padding required or not), we use contract cases, and arrive at this specification (see the full code in this github repository):  function Left_Pad (S : String; Pad_Char : Character; Len : Natural) return String with Contract_Cases => (Len > S'Length => Left_Pad'Result'Length = Len and then (for all I in Left_Pad'Result'Range => Left_Pad'Result (I) = (if I <= Len - S'Length then Pad_Char else S (I - (Len - S'Length + 1) + S'First))), others => Left_Pad'Result = S); In the case where padding is required, the spec also nicely shows how the result string is composed of both padding chars and chars from the input string. The implementation in SPARK is of course very simple; we can even use an expression function to do it:  function Left_Pad (S : String; Pad_Char : Character; Len : Natural) return String is ((1 .. Len - S'Length => Pad_Char) & S); Unique The problem description, as defined by Hillel: Takes a sequence of integers, returns the unique elements of that list. There is no requirement on the ordering of the returned values. An explanation in this blog wouldn’t add anything to the commented code, so I suggest you check out the code for my solution directly. Fulcrum The Fulcrum problem was the heart of the challenge. Although the implementation is also just a few lines, quite a lot of specification work is required to get it all proved. The problem description, as defined by Hillel: Given a sequence of integers, returns the index i that minimizes |sum(seq[..i]) - sum(seq[i..])|. Does this in O(n) time and O(n) memory. (Side note: It took me quite a while to notice it, but the above notation seq[..i] apparently means the slice up to and excluding the value at index i. I have taken it instead to mean the slice up to and including the value at index i. Consequently, I used seq[i+1..] for the second slice. This doesn't change the nature or difficulty of the problem.) I’m pleased with the solution I arrived at, so I will present it in detail in this blog post. It has two features that I think none of the other solutions has: • It runs in O(1) space, compared to the O(n) permitted by the problem description and required by the other solutions; • It uses bounded integers and proves absence of overflow; all other solutions use unbounded integers to side-step this issue. Again, our first step is to transform the problem statement into a subprogram specification (please refer to the full solution on github for all the details: there are many comments in the code). For this program, we use arrays to represent sequences of integers, so we first need an array type and a function that can sum all integers in an array. My first try looked like this: type Seq is array (Integer range <>) of Integer; function Sum (S : Seq) return Integer is (if S’Length = 0 then 0 else S (S’First) + Sum (S (S’First + 1 .. S’Last))); So we are just traversing the array via recursive calls, summing up the cells as we go. Assuming an array S of type Seq and an array index I, we could now write the required difference between the left and right sums as follows (assuming S is non-empty, which we assume throughout): abs (Sum (S (S’First .. I)) - Sum (S (I + 1 .. S’Last))) However, there are two problems with this code, and both are specific to how SPARK works. The first problem could be seen as a limitation of SPARK: SPARK allows recursive functions such as Sum, but can’t use them to prove code and specifications that refer to them. This would result in a lot of unproved checks in our code. But my colleague Claire showed me a really nice workaround, which I now present . The idea is to not compute a single sum value, but instead an array of partial sums, where we later can select the partial sum we want. Because such an array can’t be computed in a single expression, we define a new function Sum_Acc as a regular function (not an expression function) with a postcondition:  function Sum_Acc (S : Seq) return Seq with Post => (Sum_Acc'Result'Length = S'Length and then Sum_Acc'Result'First = S'First and then Sum_Acc'Result (S'First) = S (S'First) and then (for all I in S'First + 1 .. S'Last => Sum_Acc'Result (I) = Sum_Acc'Result (I - 1) + S (I))); The idea is that each cell of the result array contains the sum of the input array cells, up to and including the corresponding cell in the input array. For an input array (1,2,3), the function would compute the partial sums (1,3,6). The last value is always the sum of the entire array. In the postcondition, we express that the result array has the same length and bounds as the input array, that the first cell of the result array is always equal to the first cell of the input array, and that each following cell is the sum of the previous cell of the result array, plus the input cell at the same index. The implementation of this Sum_Acc function is straightforward and can be found in the code on github. We also need a Sum_Acc_Rev function which computes the partial sum starting from the end of the array. It is almost the same as Sum_Acc, but the initial (in fact last) value of the array will be zero, owing to the asymmetric definition of the two sums in the initial problem description. You can also find its specification and implementation in the github repository. For our array S and index I, the expression to compute the difference between left and right sum now becomes: abs (Sum_Acc (S) (I) - Sum_Acc_Rev (S) (I)) The second problem is that we are using Integer both as the type of the array cells and the sum result, but Integer is a bounded integer type (32 or 64 bits, depending on your platform), so the sum function will overflow! All other Fulcrum solutions referred to in Hillel’s summary post use unbounded integers, so they avoid this issue. But in many contexts, unbounded integers are unacceptable, because they are unpredictable in space and time requirements and require dynamic memory handling. This is why I decided to increase the difficulty of the Fulcrum problem a little and include proof of absence of overflow. To solve the overflow problem, we need to bound the size of the values to be summed and how many we can sum them. We also need to take into account negative values, so that we also don’t exceed the minimum value. Luckily, range types in SPARK make this simple:  subtype Int is Integer range -1000 .. 1000; subtype Nat is Integer range 1 .. 1000; type Seq is array (Nat range <>) of Int; type Partial_Sums is array (Nat range <>) of Integer; We will use Int as the type for the contents of arrays, and Nat for the array indices (effectively limiting the size of arrays to at most 1000). The sums will still be calculated in Integer, so we need to define a new array type to hold the partial sums (we need to change the return type of the partial sums functions to Partial_Sums to make this work). So if we were to sum the largest possible array with 1000 cells, containing only the the highest (or lowest) value 1000 (or -1000), the absolute value of the sum would not exceed 1 million, which even fits easily into 32 bits. Of course, we could also choose different, and much larger, bounds here. We now can formulate the specification of Find_Fulcrum as follows:  function Find_Fulcrum (S : Seq) return Nat with Pre => S'Length > 0, Post => (Find_Fulcrum'Result in S'Range and then (for all I in S'Range => abs (Sum_Acc (S) (I) - Sum_Acc_Rev (S) (I)) >= abs (Sum_Acc (S) (Find_Fulcrum'Result) - Sum_Acc_Rev (S) (Find_Fulcrum'Result)))); The implementation of Fulcrum The Sum_Acc and Sum_Acc_Rev functions we already defined suggest a simple solution to Fulcrum that goes as follows: 1. call Sum_Acc and Sum_Acc_Rev and store the result; 2. compute the index where the difference between the two arrays is the smallest. The problem with this solution is that step (1) takes O(n) space but we promised to deliver a constant space solution! So we need to do something else. In fact, we notice that every program that calls Sum_Acc and Sum_Acc_Rev will be already O(n) in space, so we should never call these functions outside of specifications. The Ghost feature of SPARK lets the compiler check this for us. Types, objects and functions can be marked ghost, and such ghost entities can only be used in specifications (like the postcondition above, or loop invariants and other intermediate assertions), but not in code. This makes sure that we don’t call these functions accidentally, slowing down our code. Marking a function ghost can be done just by adding with Ghost to its declaration. The constant space implementation idea is that if, for some index J - 1, we have the partial sums Left_Sum = Sum_Acc (J - 1) and Right_Sum = Sum_Acc_Rev (J - 1), we can compute the values for the next index J + 1 by simply adding S (J) to Left_Sum and subtracting it from Right_Sum. This simple idea gives us the core of the implementation: for I in S'First + 1 .. S'Last loop Left_Sum := Left_Sum + S (I); Right_Sum := Right_Sum - S (I); if abs (Left_Sum - Right_Sum) < Min then Min := abs (Left_Sum - Right_Sum); Index := I; end if; end loop; return Index; To understand this code, we also need to know that Min holds the current minimal difference between the two sums and Index gives the array index where this minimal difference occurred. In fact, the previous explanations of the code can be expressed quite nicely using this loop invariant (it also holds at the beginning of the loop): pragma Loop_Invariant (Left_Sum = Sum_Acc (S) (I - 1) and then Right_Sum = Sum_Acc_Rev (S) (I - 1) and then Min = abs (Sum_Acc (S) (Index) - Sum_Acc_Rev (S) (Index)) and then (for all K in S'First .. I - 1 => abs (Sum_Acc (S) (K) - Sum_Acc_Rev (S) (K)) >= abs (Sum_Acc (S) (Index) - Sum_Acc_Rev (S) (Index)))); The only part that’s missing is the initial setup, so that the above conditions hold for I = S’First. For the Right_Sum variable, this requires a traversal of the entire array to compute the initial right sum. For this we have written a helper function Sum which is O(n) time and O(1) space. So we end up with this initialization code for Find_Fulcrum: Index : Nat := S'First; Left_Sum : Integer := S (S'First); Right_Sum : Integer := Sum (S); Min : Integer := abs (Left_Sum - Right_Sum); and it can be seen that these initial values establish the loop invariants for the first iteration of the loop (where I = S’First + 1, so I - 1 = S’First). Some metrics It took me roughly 15 minutes to come up with the Leftpad proof. I don’t have exact numbers for the two other problems, but I would guess roughly one hour for Unique and 2-3 hours for Fulcrum. Fulcrum is about 110 lines of code, including specifications, but excluding comments and blank lines. An implementation without any contracts would be at 35 lines, so we have a threefold overhead of specification to code … though that’s typical for specification of small algorithmic problems like Fulcrum. All proofs are done automatically, and SPARK verifies each example in less than 10 seconds. Some thoughts about the challenge First of all, many thanks to Hillel for starting that challenge and to Adrian Rueegsegger for bringing it to my attention. It was fun to do, and I believe SPARK did reasonably well in this challenge. Hillel’s motivation was to counter exaggerated praise of functional programming (FP) over imperative programming (IP). So he proved these three examples in imperative style and challenged the functional programming community to do the same. In this context, doing the problems in SPARK was probably besides the point, because it’s not a functional language. Going beyond the FP vs IP question asked in the challenge, I think we can learn a lot by looking at the solutions and the discussion around the challenge. This blog post by Neelakantan Krishnaswami argues that the real issue is the combination of language features, in particular aliasing in combination with side effects. If you have both, verification is hard. One approach is to accept this situation and deal with it. In Frama-C, a verification tool for C, it is common to write contracts that separate memory regions. Some tools are based on separation logic, which directly features separating conjunction in the specification language. Both result in quite complex specifications, in my opinion. Or you can change the initial conditions and remove language features. Functional programming removes side effects to make aliasing harmless. This has all kinds of consequences, for example the need for a garbage collector and the inability to use imperative algorithms from the literature. SPARK keeps side effects but removes aliasing, essentially by excluding pointers and a few other language rules. SPARK makes up for the loss of pointers by providing built-in arrays and parameter modes (two very common usages of pointers in C), but shared data structures still remain impossible to write in pure SPARK: one needs to leave the SPARK subset and go to full Ada for that. Rust keeps pointers, but only non-aliasing ones via its borrow checker; however, formal verification for Rust is still in very early stages as far as I know. The SPARK team is also working on support for non-aliasing pointers. Beyond language features, another important question is the applicability of a language to a domain. Formal verification of code matters most where a bug would have a big impact. Such code can be found in embedded devices, for example safety-critical code that makes sure the device is safe to use (think of airplanes, cars, or medical devices). Embedded programming, in particular safety-critical embedded programming, is special because different constraints apply. For example, execution times and memory usage of the program must be very predictable, which excludes languages that have a managed approach to memory (memory usage becomes less predictable) with a GC (often, the GC can kick in at any time, which makes execution times unpredictable). In these domains, functional languages can’t really be applied directly (but see the use of Haskell in sel4). Unbounded integers can’t be used either because of the same issue - memory usage can grow if some computation yields a very large result, and execution time can vary as well. These issues were the main motivation for me to provide a O(1) space solution that uses bounded integers for the Fulcrum problem and proving absence of overflow. Programming languages aren’t everything either. Another important issue is tooling, in particular proof automation. Looking at the functional solutions of Fulcrum (linked from Hillel’s blog post), they contain a lot of manual proofs. The Agda solution is very small despite this fact, though it uses a simple quadratic algorithm; I would love to see a variant that’s linear. I believe that for formal verification to be accepted in industrial projects, most proofs must be able to be completed automatically, though some manual effort is acceptable for a small percentage of the proof obligations. The Dafny and SPARK solutions are the only ones (as far as I could see) that fare well in this regard. Dafny is well-known for its excellent proof automation via Boogie and Z3. SPARK also does very well here, all proofs being fully automatic. ]]> PolyORB now lives on Github https://blog.adacore.com/polyorb-now-lives-on-github Wed, 18 Apr 2018 15:52:00 +0000 Thomas Quinot https://blog.adacore.com/polyorb-now-lives-on-github PolyORB, AdaCore's versatile distribution middleware, now lives on Github. Its new home is https://github.com/AdaCore/polyorb PolyORB is a development toolsuite and a library of runtime components that implements several distribution models, including CORBA and the Ada 95 Distributed Systems Annex. Originally developed as part of academic research at Telecom ParisTech, it became part of the GNAT Pro family in 2003. Since then, it has been used in a number of industrial applications in a wide variety of domains such as: * air traffic flow management * enterprise document management * scientific data processing in particle physics experiments AdaCore has always been committed to involving the user community in the development of PolyORB. Over the past 15 years, many contributions from industrial as well as hobbyist users have been integrated, and community releases were previously made available in conjunction with GNAT GPL. Today we are pleased to further this community engagement and renew our commitment to an open development process by making the PolyORB repository (including full history) available on Github. This will allow users of GNAT GPL to benefit from the latest developments and contribute fixes and improvements. We look forward to seeing your issues and pull requests on this repository! ]]> SPARKZumo Part 2: Integrating the Arduino Build Environment Into GPS https://blog.adacore.com/sparkzumo-part-2-integrating-the-arduino-build-environment-into-gps Wed, 04 Apr 2018 04:00:00 +0000 Rob Tice https://blog.adacore.com/sparkzumo-part-2-integrating-the-arduino-build-environment-into-gps This is part #2 of the SPARKZumo project where we go through how to actually integrate a CCG application in with other source code and how to create GPS plugins to customize features like automating builds and flashing hardware. To read more about the software design of the project visit the other blog post here. The Build Process At the beginning of our build process we have a few different types of source files that we need to bring together into one binary, Ada/SPARK, C++, C, and an Arduino sketch. During a typical Arduino build, the build system converts the Arduino sketch into valid C++ code, brings in any libraries (user and system) that are included in the sketch, synthesizes a main, compiles and links that all together with the Arduino runtime and selected BSP, and generates the resulting executable binary. The only step we are adding to this process is that we need to run CCG on our SPARK code to generate a C library that we can pass to the Arduino build as a valid Arduino library. The Arduino sketch then pulls the resulting library into the build via an include. **The CCG tool is available as part of a GNATPro product subscription and is not included with the GNAT Community release.** Build Steps From the user’s perspective, the steps necessary to build this application are as follows: 1. Run CCG on the SPARK/Ada Code to produce C files and Ada Library Information files, or ali files. For more information on these files, see the GNAT Compilation Model documentation. 2. Copy the resulting C files into a directory structure valid for an Arduino library 1. We will use the lib directory in the main repo to house the generated Arduino library. 3. Run c-gnatls on the ali files to determine which runtime files our application depends on. 4. Copy those runtime files into the Arduino library structure. 5. Make sure our Arduino sketch has included the header files generated by the CCG tool. 6. Run the arduino-builder tool with the appropriate options to tell the tool where our library lives and which board we are compiling for. 1. The arduino-builder tool will use the .build directory in the repo to stage the build 7. Then we can flash the result of the compilation to our target board. That seems like a lot of work to do every time we need to make a change to our software! Since these steps are the same every time, we can automate this. Since we should try to make this as host agnostic as possible, meaning we would like for this to be used on Windows and Linux, we should use a scripting language which is fairly host agnostic. It would also be nice if we could integrate this workflow into GPS so that we can develop our code, prove our code, and build and flash our code without leaving our IDE. It is an Integrated Development Environment after all. Configuration Files The arduino-builder program is the command line version of the Arduino IDE. When you build an application with the Arduino IDE it creates a build.options.json file with the options you select from the IDE. These options include the location of any user libraries, the hardware to build for, where the toolchain lives, and where the sketch lives. We can pass the same options to the arduino-builder program or we can pass it the location of a build.options.json file. For this application I put a build.options.json file in the conf directory of the repository. This file should be configured properly for your build system. The best way, I have found, to get this file configured properly is to install the Arduino IDE and build one of the example applications. Then find the generated build.options.json file generated by the IDE and copy that into the conf directory of the repository. You then only need to modify: 1. The “otherLibrariesFolders” to point to the absolute path of the lib folder in the repo. 2. The”sketchLocation” to point at the SPARKZumo.ino file in the repo. The other conf files in the conf directory are there to configure the flash utilities. When flashing the AVR on the Arduino Uno, the avrdude flash utility is used. This application takes the information from the flash.yaml file and the path of the avrdude.conf file to configure the flash command. Avrdude uses this to inform the flashing utility about the target hardware. The HiFive board uses openocd as its flashing utility. The openocd.cfg file has all the necessary configuration information that is passed to the openocd tool for flashing. The GPS Plugin [DISCLAIMER: This guide assumes you are using version 18.1 or newer of GPS] Under the hood, GPS, or the GNAT Programming Studio, has a combination of Ada, graphical frameworks, and Python scripting utilities. Using the Python plugin interface, it is very easy to add functionality to our GPS environment. For this application we will add some buttons and menu items to automate the process mentioned above. We will only be using a small subset of the power of the Python interface. For a complete guide to what is possible you can visit the Customizing and Extending GPS and Scripting API Reference for GPS sections of the GPS User’s Guide. Plugin Installation Locations Depending on your use case you can add Python plugins in a few locations to bring them into your GPS environment. There are already a handful of plugins that come with the GPS installation. You can find the list of these plugins by going to Edit->Preferences and navigating to the Plugin tab (near the bottom of the preferences window on the left sidebar). Because these plugins are included with the installation, they live under the installation directory in <installation directory>/share/gps/plug-ins. If you would like to modify you installation, you can add your plugins here and reload GPS. They will then show up in the plugin list. However, if you reinstall GPS, it will overwrite your plugin! There is a better place to put your plugins such that they won’t disappear when you update your GPS installation. GPS adds a folder to your Home directory which includes all your user defined settings for GPS, such as your color theme, font settings, pretty printer settings, etc. This folder, by default, lives in <user’s home directory>/.gps. If you navigate to this folder you will see a plug-ins folder where you can add your custom plugins. When you update your GPS installation, this folder persists. Depending on your application, there may be an even better place to put your plugin. For this specific application we really only want this added functionality when we have the SPARKzumo project loaded. So ideally, we want the plugin to live in the same folder as the project, and to load only when we load the project. To get this functionality, we can name our plugin <project file name>.ide.py and put it in the same directory as our project. When GPS loads the project, it will also load the plugin. For example, our project file is named zumo.gpr, so our plugin should be called zumo.ide.py. The source for the zumo.ide.py file is located here. The Plugin Skeleton When GPS loads our plugin it will call the initialize_project_plugin function. We should implement something like this to create our first button: import GPS import gps_utils class ArduinoWorkflow: def __somefunction(self): # do stuff here def __init__(self): gps_utils.make_interactive( callback=self.__somefunction, category="Build", name="Example", toolbar='main', menu='/Build/Arduino/' + "Example", description="Example") def initialize_project_plugin(): ArduinoWorkflow() This simple class will create a button and a menu item with the text Example. When we click this button or menu item it will callback to our somefunction function. Our actual plugin creates a few buttons and menu items that look like this: Task Workflows Now that we have the ability to run some scripts by clicking buttons we are all set! But there’s a problem; when we execute a script from a button, and the script takes some time to perform some actions, GPS hangs waiting for the script to complete. We really should be executing our script asynchronously so that we can still use GPS while we are waiting for the tasks to complete. Python has a nice feature called coroutines which can allow us to run some tasks asynchronously. We can be super fancy and implement these coroutines using generators! Or… ProcessWrapper GPS has already done this for us with the task_workflow interface. The task_workflow call wraps our function in a generator and will asynchronously execute parts of our script. We can modify our somefunction function now to look like this: def __somefunction(self, task): task.set_progress(0, 1) try: proc = promises.ProcessWrapper(["script", "arg1", "arg2"], spawn_console="") except: self.__error_exit("Could not launch script.") return ret, output = yield proc.wait_until_terminate() if ret is not 0: self.__error_exit("Script returned an error.") return task.set_progress(1, 1) In this function we are going to execute a script called script and pass 2 arguments to it. We wrap the call to the script in a ProcessWrapper which returns a promise. We then yield on the result. The process will run asynchronously, and the main thread will transfer control back to the main process. When the script is complete, the yield returns the stdout and exit code of the process. We can even feed some information back to the user about the progress of the background processes using the task.set_progress call. This registers the task in the task window in GPS. If we have many tasks to run, we can update the task window after each task to tell the user if we are done yet. TargetWrapper The ProcessWrapper interface is nice if we need to run an external script but what if we want to trigger the build or one of the gnat tools? Triggering CCG Just for that, there’s another interface: TargetWrapper. To trigger the build tools, we can run something like this: builder = promises.TargetWrapper("Build All") retval = yield builder.wait_on_execute() if retval is not 0: self.__error_exit("Failed to build all.") return With this code, we are triggering the same action as the Build All button or menu item. Triggering GNATdoc We can also trigger the other tools within the GNAT suite using the same technique. For example, we can run the GNATdoc tool against our project to generate the project documentation: gnatdoc = promises.TargetWrapper("gnatdoc") retval = yield gnatdoc.wait_on_execute(extra_args=["-P", GPS.Project.root().file().path, "-l"]) if retval is not 0: self.__error_exit("Failed to generate project documentation.") return Here we are calling gnatdoc with the arguments listed in extra_args. This command will generate the project documentation and put it in the directory specified by the Documentation_Dir attribute of the Documentation package in the project file. In this case, I am putting the docs in the docs folder of the repo so that my GitHub repo can serve those via a GitHub Pages website Accessing Project Configuration The file that drives the GNAT tools is the GNAT Project file, or the gpr file. This file has all the information necessary for GPS and CCG to process the source files and build the application. We can access all of this information from the plugin as well to inform where to find the source files, where to find the object files, and what build configuration we are using. For example, to access the list of source files for the project we can use the following Python command: GPS.Project.root().sources(). Another important piece of information that we would like to get from the project file is the current value assigned to the “board” scenario variable. This will tell us if we are building for the Arduino target or the HiFive target. This variable will change the build configuration that we pass to arduino-builder and which flash utility we call. We can access this information by using the following command: GPS.Project.root().scenario_variables(). This will return a dictionary of all scenario variables used in the project. We can then access the “board” scenario variable using the typical Python dictionary syntax GPS.Project.root().scenario_variables()[‘board’]. Determining Runtime Dependencies Because we are using the Arduino build system to build the output of our CCG tool, we will need to include the runtime dependency files used by our CCG application in the Arduino library directory. To detect which runtime files we are using we can run the c-gnatls command against the ali files generated by the CCG tool. This will output a set of information that we can parse. The output of c-gnatls on one file looks something like this $ c-gnatls -d -a -s obj/geo_filter.ali
types.ads

When we parse this output we will have to make sure we run c-gnatls against all ali files generated by CCG, we will need to strip out any files listed that are actually part of our sources already, and we will need to remove any duplicate dependencies. The c-gnatls tool also lists the Ada versions of the runtime files and not the C versions. So we need to determine the C equivalents and then copy them into our Arduino library folder. The __get_runtime_deps function is responsible for all of this work.

Generating Lookup Tables

If you had a chance to look at the first blog post in this series, I talked about a bit about code in this application that was used to do some filtering of discrete states using a graph filter. This involved mapping some states onto some physical geometry and sectioning off areas that belonged to different states. The outcome of this was to map each point in a 2D graph to some state using a lookup table.

To generate this lookup table I used a python library called shapely to compute the necessary geometry and map points to states. Originally, I had this as a separate utility sitting in the utils folder in the repo and would copy the output of this program into the geo_filter.ads file by hand. Eventually, I was able to bring this utility into the plugin workflow using a few interesting features of GPS.

GPS includes pip

Even though GPS has the Python env embedded in it, you can still bring in outside packages using the pip interface. The syntax for installing an external dependency looks something like:

import pip
ret = pip.main(["install"] + dependency)

Where dependency is the thing you are looking to install. In the case of this plugin, I only need the shapely library and am installing that when the GPS plugin is initialized.

The Libadalang library is now included with GPS and can be used inside your plugin. Using the libadalang interface I was able to access the value of user defined named numbers in the Ada files. This was then passed to the shapely application to compute the necessary geometry.

ctx = lal.AnalysisContext()
unit = ctx.get_from_file(file_to_edit)
myVarNode = unit.root.findall(lambda n: n.is_a(lal.NumberDecl) and n.f_ids.text=='my_var')
value = int(myVarNode[0].f_expr.text)

This snippet creates a new Libadalang analysis context, loads the information from a file and searches for a named number declaration called ‘my_var’. The value assigned to ‘my_var’ is then stored in our variable value.

I was then able to access the location where I wanted to put the output of the shapely application using Libadalang:

array_node = unit.root.findall(lambda n: n.is_a(lal.ObjectDecl) and n.f_ids.text=='my_array')
agg_start_line = int(array_node[0].f_default_expr.sloc_range.start.line)
agg_start_col = int(array_node[0].f_default_expr.sloc_range.start.column)
agg_end_line = int(array_node[0].f_default_expr.sloc_range.end.line)
agg_end_col = int(array_node[0].f_default_expr.sloc_range.end.column)

This gave me the line and column number of the start of the array aggregate initializer for the lookup table ‘my_array’.

Editing Files in GPS from the Plugin

Now that we have the computed lookup table, we could use the typical python file open mechanism to edit the file at the location obtained from Libadalang. But since we are already in GPS, we could just use the GPS.EditorBuffer interface to edit the file. Using the information from our shapely application and the line and column information obtained from Libadalang we can do this:

buf = GPS.EditorBuffer.get(GPS.File(file_to_edit))
agg_start_cursor = buf.at(agg_start_line, agg_start_col)
agg_end_cursor = buf.at(agg_end_line, agg_end_col)
buf.delete(agg_start_cursor, agg_end_cursor)
array_str = "(%s));" % ("(%s" % ("),\n(".join([', '.join([item for item in row]) for row in array])))
buf.insert(agg_start_cursor, array_str[agg_start_col - 1:])

First we open a buffer to the file that we want to edit. Then we create a GPS.Location for the beginning and end of the current array aggregate positions that we obtained from Libadalang. Then we remove the old information in the buffer. We then turn the array we received from our shapely application into a string and insert that into the buffer.

We have just successfully generated some Ada code from our GPS plugin!

Most probably, there is already a plugin that exists in the GPS distribution that does something similar to what you want to do. For this plugin, I used the source for the plugin that enables flashing and debugging of bare-metal STM32 ARM boards. This file can be found in your GPS installation at <install directory>/share/gps/support/ui/board_support.py. You can also see this file on the GPS GitHub repository here.

In most cases, it makes sense to search through the plugins that already exist to get a starting point for your specific application, then you can fill in the blanks from there. You can view the entire source of GPS on AdaCore’s Github repository

That wraps up the overview of the build system for this application. The source for the project can be found here. Feel free to fork this project and create new and interesting things.

Happy Hacking!

]]>

One of the most criticized aspect of the Ada language throughout the years has been its outdated syntax. Fortunately, AdaCore decided to tackle this issue by implementing a new, modern, syntax for Ada.

The major change is the use of curly braces instead of begin/end. Also the following keywords have been shortened:

• return becomes ret
• function becomes fn
• is becomes :
• with becomes include

For instance, the following function:

with Ada.Numerics;

function Fools (X : Float) return Float is
begin
end;

is now written:

include Ada.Numerics;

fn Fools (X : Float) ret Float :
{
};

This modern syntax is a major milestone in the adoption of Ada. John Dorab recently discovered the qualities of Ada:

I have an eye condition that prevents me from reading code without curly braces. Thanks to this new syntax, I can now benefit from the advanced type system, programming by contract, portability, functional safety, [insert more cool Ada features here] of Ada. Also, it looks like most other programming languages, so it must be better.

This new syntax is also a boost in productivity for Ada developers. Mr Fisher testifies:

I write at around 10 lines of code per day. With this new syntax I save up to 30 keystrokes. That’s at a huge increase to my productivity! The code is less readable for debugging, code reviews and maintenance in general, but I write a little bit more of it.

The standardization effort related to this new syntax is expected to start in the coming year and to last a few years. In order to allow early adopters to get their hands on Ada without having to wait for the next standard, we have created a font that allows you to display Ada code with the new syntax:

The font contains other useful ligatures, like displaying the Ada assignment operator “:=” as “=”, and the Ada equality operator “=” as “==”. It was created based on Courier New, using Glyphr Studio and cloudconvert. It’s work in progress, so feel free to extend it! It is attached below.

In the future, we also plan to go beyond a pure syntactic layer with Libadalang. For example, we could emulate the complex type promotions/conversions rules of C by inserting Unchecked_Conversion calls each time the programmer tries to convert a value to an incompatible type. If you have other ideas, please let us know in the comments below!

Attachments

]]>

There are a lot of DIY CNC projects out there (router, laser, 3D printer, egg drawing, etc.), but I never saw a DIY CNC sandblaster. So I decided to make my own.

Hardware

The CNC frame is one of those cheap kits that you can get on ebay for instance. Mine was around 200 euros, and it is actually a good value for the price. I built the kit and then replaced the electronic controller with an STM32F469 discovery board and an arduino CNC shield.

For the sandblaster itself, my father and I hacked this simple solution made from a soda bottle and pipes/fittings that you can find in any hardware store.

The sand is falling from the tank in a small tube mostly thanks to gravity. The sand tank still needs to be pressurised to avoid air coming up from the nozzle.

The sandblaster was then mounted to the CNC frame where the engraving spindle is supposed to be, and the sand tank is somewhat fixed above the machine. As you can shortly see on the video, I’m controlling the airflow manually as I didn’t have a solenoid valve to make the machine fully autonomous.

Software

On the software side I re-used my Ada Gcode controller from a previous project. I still wanted to add something to it, so this time I used a board with a touch screen to create a simple control interface.

Conclusion

This machine is actually not very practical. The 1.5 litre soda bottle holds barely enough sand to write 3 letters and the dust going everywhere will jam the machine after a few minutes of use. But this was a fun project nonetheless!

PS: Thank you dad for letting me use your workshop once again ;)

]]>

So you want to use SPARK for your next microcontroller project? Great choice! All you need is an Ada 2012 ready compiler and the SPARK tools. But what happens when an Ada 2012 compiler isn’t available for your architecture?

This was the case when I started working on a mini sumo robot based on the Pololu Zumo v1.2

The chassis is complete with independent left and right motors with silicone tracks, and a suite of sensors including an array of infrared reflectance sensors, a buzzer, a 3-axis accelerometer, magnetometer, and gyroscope. The robot’s control interface uses a pin-out and footprint compatible with Arduino Uno-like microcontrollers. This is super convenient, because I can use any Arduino Uno compatible board, plug it into the robot, and be ready to go. But the Arduino Uno is an AVR, and there isn’t a readily available Ada 2012 compiler for AVR… back to the drawing board…

Or…

What if we could still write SPARK code and be able to compile it into some C code. Then use the Arduino compiler to compile and link this code in with the Arduino BSPs and runtimes? This would be ideal because I wouldn’t need to worry about writing a BSP for the board I am using, and I would only have to focus on the application layer. And I can use SPARK! Luckily, AdaCore has a solution for exactly this!

CCG to the rescue!

The Common Code Generator, or CCG, was developed to solve the issue where an Ada compiler is not available for a specific architecture, but a C compiler is readily available. This is the case for architectures like AVR, PIC, Renesas, and specialized DSPs from companies like TI and Analog Devices. CCG can take your Ada or SPARK code, and “compile” it to a format that the manufacturer’s supplied C compiler can understand. With this technology, we now have all of the benefits of Ada or SPARK on any architecture. **The CCG tool is available as part of a GNATPro product subscription and is not included with the GNAT Community release.**

Note that this is not fundamentally different from what’s already happening in a compiler today. Compilation is essentially a series of translations from one language to the other, each one being used for specific optimization or analysis phase. In the case of GNAT for example the process is as follows:

1. The Ada code is first translated into a simplified version of Ada (called the expanded tree).

2. Then into the gcc tree format which is common to all gcc-supported languages.

3. Then into a format ideal for computing optimizations called gimple.

4. Then into a generic assembly language called RTL.

5. And finally to the actual target assembler.

With CCG, C becomes one of these intermediate languages, with GNAT taking care of the initial compilation steps and a target compiler taking care of the final ones. One important consequence of this is that the C code is not intended to be maintainable or modified. CCG is not a translator from Ada or SPARK to C, it’s a compiler, or maybe half a compiler.

There are some limitations to this though, that are important to know, which are today mostly due to the fact that the technology is very young and targets a subset of Ada. Looking at the limitations more closely, they resemble the limitations imposed by the SPARK language subset on a zero-footprint runtime. I would generally use the zero-footprint runtime in an environment where the BSP and runtime were supplied by a vendor or an RTOS, so this looks like a perfect time to use CCG to develop SPARK code for an Arduino supported board using the Arduino BSP and runtime support.  For a complete list of supported and unsupported constructs you can visit the CCG User’s Guide.

Another benefit I get out of this setup is that I am using the Arduino framework as a hardware abstraction layer. Because I am generating C code and pulling in Arduino library calls, theoretically, I can build my application for many processors without changing my application code. As long as the board is supported by Arduino and is pin compatible with my hardware, my application will run on it!

Abstracting the Hardware

For this application I looked at targeting two different architectures, the Arduino Uno Rev 3 which has an ATmega328p on board, and a SiFive HiFive1 which has a Freedom E310 on board. These were chosen because they are pin compatible but are massively different from the software perspective. The ATmega328p is a 16 bit AVR and the Freedom E310 is a 32 bit RISC-V. The system word size isn’t even the same! The source code for the project is located here.

In order to abstract the hardware differences away, two steps had to be taken:

1. I used a target configuration file to tell the CCG tool how to represent data sizes during the code generation. By default, CCG assumes word sizes based on the default for the host OS. To compile for the 16 bit AVR, I used the target.atp file located in the base directory to inform the tool about the layout of the hardware. The configuration file looks like this:
2. Bits_BE                       0
Bits_Per_Unit                 8
Bits_Per_Word                16
Bytes_BE                      0
Char_Size                     8
Double_Float_Alignment        0
Double_Scalar_Alignment       0
Double_Size                  32
Float_Size                   32
Float_Words_BE                0
Int_Size                     16
Long_Double_Size             32
Long_Long_Size               64
Long_Size                    32
Maximum_Alignment            16
Max_Unaligned_Field          64
Pointer_Size                 32
Short_Enums                   0
Short_Size                   16
Strict_Alignment              0
System_Allocator_Alignment   16
Wchar_T_Size                 16
Words_BE                      0
float         15  I  32  32
double        15  I  32  32
3. The bsp folder contains all of the differences between the two boards that were necessary to separate out. This is also where the Arduino runtime calls were pulled into the Ada code. For example, in bsp/wire.ads you can see many pragma Import calls used to bring in the Arduino I2C calls located in wire.h.

In order to tell the project which version of these files to use during the compilation, I created a scenario variable in the main project, zumo.gpr

type Board_Type is ("uno", "hifive");
Board : Board_Type := external ("board", "hifive");

Common_Sources := ("src/**", "bsp/");
Target_Sources := "";
case Board is
when "uno" =>
Target_Sources := "bsp/atmega328p";
when "hifive" =>
Target_Sources := "bsp/freedom_e310-G000";
end case;

for Source_Dirs use Common_Sources & Target_Sources;

Software Design

Interaction with Arduino Sketch

A typical Arduino application exposes two functions to the developer through the sketch file: setup and loop. The developer would fill in the setup function with all of the code that should be run once at start-up, and then populates the loop function with the actual application programming. During the Arduino compilation, these two functions get pre-processed and wrapped into a main generated by the Arduino runtime. More information about the Arduino build process can be found here.

Because we are using the Arduino runtime we cannot have the actual main entry point for the application in the Ada code (the Arduino pre-processor generates this for us). Instead, we have an Arduino sketch file called SPARKZumo.ino which has the typical Arduino setup() and loop() functions. From setup() we need to initialize the Ada environment by calling the function generated by the Ada binder, sparkzumoinit(). Then, we can call whatever setup sequence we want.

CCG maps Ada package and subprogram namespacing into C-like namespacing, so package.subprogram in Ada would become package__subprogram() in C. The setup function we are calling in the sketch is sparkzumo.setup in Ada, which becomes sparkzumo__setup() after CCG generates the files. The loop function we are calling in the sketch is sparkzumo.workloop in Ada, which becomes sparkzumo__workloop().

Handling Exceptions

Even though we are generating C code from Ada, the CCG tool can still expand the Ada code to include many of the compiler generated checks associated with Ada code before generating the C code. This is very cool because we still have much of the power of the Ada language even though we are compiling to C.

If any of these checks fail at runtime, the __gnat_last_chance_handler is called. The CCG system supplies a definition for what this function should look like, but leaves the implementation up to the developer. For this application, I put the handler implementation in the sketch file, but am calling back into the Ada code from the sketch to perform more actions (like blink LEDs and shut down the motors). If there is a range check failure, or a buffer overflow, or something similar, my __gnat_last_chance_handler will dump some information to the serial port then call back into the Ada code to  shut down the motors, and flash an LED on an infinite loop. We should never need this mechanism because since we are using SPARK in this application, we should be able to prove that none of these will ever occur!

Standard.h file

The minimal runtime that does come with the CCG tool can be found in the installation directory under the adalib folder. Here you will find the C versions of the Ada runtimes files that you would typically find in the adainclude directory.

The important file to know about here is the standard.h file. This is the main C header file that will allow you to map Ada to C constructs. For instance, this header file defines the fatptr construct used under Ada arrays and strings, and other integral types like Natural, Positive, and Boolean.

You can and should modify this file to fit within your build environment. For my application, I have included the Arduino.h at the top to bring in the Arduino type system and constructs. Because the Arduino framework defines things like Booleans, I have commented out the versions defined in the standard.h file so that I am consistent with the rest of the Arduino runtime. You can find the edited version of the standard.h file for this project in the src directory.

Drivers

For the application to interact with all of the sensors available on the robot, we need a layer between the runtime and BSP, and the algorithms. The src/drivers directory contains all of the code necessary to communicate with the sensors and motors. Most of the initial source code for this section was a direct port from the zumo-shield library that was originally written in C++. After porting to Ada, the code was modified to be more robust by refactoring and adding SPARK contracts.

Algorithms

Even though this is a sumo robot, I decided to start with a line follower algorithm for the proof of concept. The source code for the line follower algorithm can be found in src/algos/line_finder. The algorithm was originally a direct port of the Line Follow example in the zumo-shield examples repo.

The C++ version of this algorithm worked ok but wasn’t really able to handle occasions where the line was lost, or the robot came to a fork, or an intersection. After refactoring and adding SPARK features, I added a detection lookup so that the robot could determine what type of environment the sensors were looking at. The choices are: Lost (meaning no line is found), Online (meaning there’s a single line), Fork (two lines diverge), BranchLeft (left turn), BranchRight (right turn), Perpendicular intersection (make a decision to go left or right), or Unknown (no clue what to do, let’s keep doing what we were doing and see what happens next). After detecting a change in state, the robot would make a decision like turn left, or turn right to follow a new line. If the robot was in a Lost state, it would go into a “re-finding” algorithm where it would start to do progressively larger circles.

This algorithm worked ok as well, but was a little strange. Occasionally, the robot would decide to change direction in the middle of a line, or start to take a branch and turn back the other way. The reason for this was that the robot was detecting spurious changes in state and reacting to them instantaneously. We can call this state noise. In order to minimize this state noise, I added a state low-pass filter using a geometric graph filter.

The Geometric Graph Filter

If you ask a mathematician they will probably tell you there’s a better way to filter discrete states than this, but this method worked for me! Lets picture mapping 6 points corresponding to the 6 detection states onto a 2d graph, spacing them out evenly along the perimeter of a square. Now, let’s say we have a moving window average with X positions. Each time we get a state reading from the sensors we look up the corresponding coordinate for that state in the graph and add the coordinate to the window. For instance, if we detect a Online state our corresponding coordinate is (15, 15). If we detect a Perpendicular state our coordinate is (-15, 0). And so on. If we average over the window we will end up with a coordinate somewhere in the inside of the square. If we then section off the area of the square into sections, and assign each section to map to the corresponding state, we will then find that our average is sitting in one of those sections that maps to one of our states.

For an example, let’s assume our window is 5 states wide and we have detected the following list of states (BranchLeft, BranchLeft, Online, BranchLeft, Lost). If we map these to coordinates we get the following window: ((-15, 15), (-15, 15), (15, 15), (-15, 15), (-15, -15)). When we average these coordinates in the window we get a point with the coordinates (-9, 9). If we look at our lookup table we can see that this coordinate is in the BranchLeft polygon.

One issue that comes up here is that when the average point moves closer to the center of the graph, there’s high state entropy, meaning our state can change more rapidly and noise has a higher effect. To solve this, we can hold on to the previous calculated state, and if the new calculated state is somewhere in the center of the graph, we throw away the new calculation and pass along the previous calculation. We don’t purge the average window though so that if we get enough of one state, the average point can eventually migrate out to that section of the graph.

To avoid having to calculate this geometry every time we get a new state, I generated a lookup table which maps every point in the polygon to a state. All we have to do is calculate the average in the window and do the lookup at runtime. There are some python scripts that are used to generate most of the src/algos/line_finder/geo_filter.ads file. This script also generates a visual of the graph. For more information on these scripts, see part #2 of this blog post! One issue that I ran into was that I had to use a very small graph which decreased my ability to filter. This is because the amount of RAM I had available on the Arduino Uno was very small. The larger the graph, the larger the lookup table, the more RAM I needed.

There are a few modifications to this technique that could be done to make it more accurate and more fair. Using a square and only 2 dimensions to map all the states means that the distance between any two states is different than the distance between any other 2 states. For example, it’s easier to switch between BranchLeft and Online than it is to switch between BranchLeft and Fork. For the proof of concept this technique worked well though.

Future Activity

The code still needs a bit of work to get the IMU sensors up and going. We have another project called the Certyflie which has all of the gimbal calculations to synthesize roll, pitch, and yaw data from an IMU. The Arduino Uno is a bit too weak to perform these calculations properly. One issue is that there is no floating point unit on the AVR. The RISC-V has an FPU and is much more powerful. One option is to add a bluetooth transceiver to the robot and send the IMU data back to a terminal on a laptop for synthesization.

Another issue that came up during this development is that the HiFive board uses level shifters on all of the GPIO lines. The level shifters use internal pull-ups which means that the processor cannot read the reflectance sensors. The reflectance sensor is actually just a capacitor that is discharged when light hits the substrate. So to read the sensor we need to pull the GPIO line high to charge the capacitor then pull it low and read the amount of time it takes to discharge. This will tell us how much light is hitting the sensor. Since the HiFive has the pull ups on the GPIO lines, we can’t pull the line low to read the sensor. Instead we are always charging the sensor. More information about this process can be found on the IR sensor manufacturer’s website under How It Works.

There will be a second post coming soon, which will describe how to actually build this crazy project. There I detail the development of the GPS plugin that I used to build everything and flash the board. As always, the code for the entire project is available here: https://github.com/Robert-Tice/SPARKZumo

Happy Hacking!

]]>
Two Days Dedicated to Sound Static Analysis for Security https://blog.adacore.com/sound-static-analysis-for-security Wed, 14 Mar 2018 15:37:00 +0000 Yannick Moy https://blog.adacore.com/sound-static-analysis-for-security

AdaCore has been working with CEA, Inria and NIST to organize a two-days event dedicated to sound static analysis techniques and tools, and how they are used to increase the security of software-based systems. The program gathers top-notch experts in the field, from industry, government agencies and research institutes, around the three themes of analysis of legacy code, use in new developments and accountable software quality.

The theme "analysis of legacy code" is meant to all those who have to maintain an aging codebase while facing new security threats from the environment. This theme will be introduced by David A. Wheeler whose contributions to security and open source are well known. From the many articles I like from him, I recommend his in-depth analysis of Heartbleed and the State-of-the-Art Resources (SOAR) for Software Vulnerability Detection, Test, and Evaluation, a government official report detailing the tools and techniques for building secure software. David is leading the CII Best Practiced Badge Program to increase security of open source software. The presentations in this theme will touch on analysis of binaries, analysis of C code, analysis of Linux kernel code and analysis of nuclear control systems.

The theme "use in new developments" is meant to all those who start new projects with security requirements. This theme will be introduced by K. Rustan M. Leino, an emblematic researcher in program verification, who has inspired many of the profound changes in the field from his work on ESC/Modula-3 with Greg Nelson, to his work on a comprehensive formal verification environment around the Dafny language, with many others in between: ESC/Java, Spec#, Boogie, Chalice, etc. The presentations in this theme will touch on securing mobile platforms and our critical infrastructure, as well as describing techniques for verifying floating-point programs and more complex requirements.

The theme "accountable software quality" is meant to all those who need to justify the security of their software, either because they have a regulatory oversight or because of commercial/corporate obligations. This theme will be introduced by David Cok, former VP of Technology and Research at GrammaTech, who is well-known for his work on formal verification tools for Java: ESC/Java, ESC/Java2, now OpenJML. The presentations in this theme will touch on what soundness means for static analysis and the demonstrable benefits it brings, the processes around the use of sound static analysis (including the integration between test and proof results), and the various levels of assurance that can be reached.

The event will take place at the National Institute of Standards and Technologies (NIST) at the invitation of researcher Paul Black. Paul co-authored a noticed report last year on Dramatically Reducing Software Vulnerabilities which highlighted sound static analysis as a promising venue. He will introduce the two days of conference with his perspective on the issue.

The workshop will end with tutorials on Frama-C & SPARK given by the technology developers (CEA and AdaCore), so that attendees can have first-hand experience with using the tools. There will be also vendor displays to discuss with techno providers. All in all, a very unique event to attend, especially when you know that, thanks to our sponsors, participation is free! But registration is compulsory. To see the full program and register for the event, see the webpage of the event.

]]>
Secure Software Architectures Based on Genode + SPARK https://blog.adacore.com/secure-software-architectures-based-on-genode-spark Mon, 05 Mar 2018 13:19:00 +0000 Yannick Moy https://blog.adacore.com/secure-software-architectures-based-on-genode-spark

SPARK user Alexander Senier recently presented their use of SPARK for building secure mobile architecture at BOB Konferenz in Germany. What's nice is that they build on the guarantees that SPARK provides at software level, using them to create a secure software architecture based on the Genode operating system framework. At 19:07 in the video he presents 3 interesting architectural designs (policy objects, trusted wrappers, and transient components) that make it possible to build a trustworthy system out of untrustworthy building blocks (like a Web browser or a network stack). Almost as exciting as Alchemy's goal of transforming lead into gold!

Their solution is to design architectures where untrusted components must communicate through trusted ones. They use Genode to enforce the rule that no other communications are allowed and SPARK to make sure that trusted components can really be trusted. You can see an example of an application they build with these technologies at Componolit at 33:37 in the video: a baseband firewall, to protect the Android platform on a mobile device (e.g., your phone) from attacks that get through the baseband processor, which manages radio communications on your mobile.

As the title of the talk says, for security of connected devices in the modern world, we are at a time "when one beyond-mainstream technology is not enough". For more info on what they do, see Componolit website.

]]>

Updated July 2018

The micro:bit is a very small ARM Cortex-M0 board designed by the BBC for computer education. It's fitted with a Nordic nRF51 Bluetooth enabled 32bit ARM microcontroller. At $15 it is one of the cheapest yet most fun piece of kit to start embedded programming. Since the initial release of this blog post we have improved the support of Ada and SPARK on the BBC micro:bit. In GNAT Community Edition 2018, the micro:bit is now directly supported on Linux, Windows and MacOS. This means that the procedure to use the board is greatly simplified: • Download and install GNAT arm-elf hosted on your platform: Windows, Linux or MacOS. This package contains the ARM cross compiler as well the required Ada run-times • Download and install GNAT native for your platform: Windows, Linux or MacOS. This package contains the GNAT Programming Studio IDE and an example to run on the micro:bit • Start GNAT Programming Studio • Click on “Create a new Project” • Select the “Scrolling Text” project under “BBC micro:bit” and click Next • Enter the directory where you want the project to be deployed and click Apply • On Linux only: you might need privileges to access the USB ports without which the flash program will say "No connected boards". To do this on Ubuntu, you can do it by creating (as administrator) the file /etc/udev/rules.d/mbed.rules and add the line: SUBSYSTEM=="usb", ATTR{idVendor}=="0d28", ATTR{idProduct}=="0204", MODE:="666" then restarting the service by doing$ sudo udevadm trigger
• Plug your micro:bit board with a USB cable, and wait for the system to recognize it. This can take a few seconds

• Back in GNAT Programming Studio, click on the “flash to board” icon

• That’s it!

We also improved the micro:bit support and documentation in the Ada Drivers Library project. Follow this link for documented examples of the various features available on the board (text scrolling, buttons, digital in/out, analog in/out, music).

Conclusion

That’s it, your first Ada program on the Micro:Bit! If you have an issue with this procedure, please tell us in the comments section below.

In the meantime, here is an example of the kind of project that you can do with Ada on the Micro:Bit

]]>
Tokeneer Fully Verified with SPARK 2014 https://blog.adacore.com/tokeneer-fully-verified-with-spark-2014 Fri, 23 Feb 2018 09:49:00 +0000 Yannick Moy https://blog.adacore.com/tokeneer-fully-verified-with-spark-2014

Tokeneer is a software for controlling physical access to a secure enclave by means of a fingerprint sensor. This software was created by Altran (Praxis at the time) in 2003 using the previous generation of SPARK language and tools, as part of a project commissioned by the NSA to investigate the rigorous development of critical software using formal methods.

The project artefacts, including the source code, were released as open source in 2008. Tokeneer was widely recognized as a milestone in industrial formal verification. Original project artefacts, including the original source code in SPARK 2005, are available here.

We recently transitioned this software to SPARK 2014, and it allowed us to go beyond what was possible with the previous SPARK technology. The initial transition by Altran and AdaCore took place in 2013-2014, when we translated all the contracts from SPARK 2005 syntax (stylized comments in the code) to SPARK 2014 syntax (aspects in the code). But at the time we did not invest the time to fully prove the resulting translated code. This is what we have now completed. The resulting code is available on GitHub. It will also be available in future SPARK releases as one of the distributed examples.

What we did

With a few changes, we went from 234 unproved checks on Tokeneer code (the version originally translated to SPARK 2014), down to 39 unproved but justified checks. The justification is important here: there are limitations to GNATprove analysis, so it is expected that users must sometimes step in and take responsibility for unproved checks.

Using predicates to express constraints

Most of the 39 justifications in Tokeneer code are for string concatenations that involve attribute 'Image. GNATprove currently does not know that S'Image(X), for a scalar type S and a variable X of this type, returns a rather small string (as specified in Ada RM), so it issues a possible range check message when concatenating such an image with any other string. We chose to isolate such calls to 'Image in dedicated functions, with suitable predicates on their return type to convey the information about the small string result. Take for example enumeration type ElementT in audittypes.ads. We define a function ElementT_Image which returns a small string starting at 1 and with length less than 20 as follows:

   function ElementT_Image (X : ElementT) return CommonTypes.StringF1L20 is
(ElementT'Image (X));
pragma Annotate (GNATprove, False_Positive,
"range check might fail",
"Image of enums of type ElementT are short strings starting at index 1");
pragma Annotate (GNATprove, False_Positive,
"predicate check might fail",
"Image of enums of type ElementT are short strings starting at index 1");

Note the use of pragma Annotate to justify the range check message and the predicate check message that are generated by GNATprove otherwise. Type StringF1L20 is defined as a subtype of the standard String type with additional constraints expressed as predicates. In fact, we create an intermediate subtype StringF1 of strings that start at index 1 and which are not "super flat", i.e. their last index is at least 0. StringF1L20 inherits from the predicate of StringF1 and adds the constraint that the length of the string is no more than 20:

   subtype StringF1 is String with
Predicate => StringF1'First = 1 and StringF1'Last >= 0;
subtype StringF1L20 is StringF1 with
Predicate => StringF1L20'Last <= 20;

Moving query functions to the spec

Another crucial change was to give visibility to client code over query functions used in contracts. Take for example the API in admin.ads. It defines the behavior of the administrator through subprograms whose contracts use query functions RolePresent, IsPresent and IsDoingOp:

   procedure Logout (TheAdmin :    out T)
with Global => null,
and not IsDoingOp (TheAdmin);

The issue was that these query functions, while conveniently abstracting away the details of what it means for the administrator to be present, or to be doing an operation, were defined in the body of package Admin, inside file admin.adb. As a result, the proof of client code of Admin had to consider these calls as blackboxes, which resulted in many unprovable checks. The fix here consisted in moving the definition for the query functions inside the private part of the spec file admin.ads: this way, client code still does not see their implementation, but GNATprove can use these expression functions in proving client code.

   function RolePresent (TheAdmin : T) return PrivTypes.PrivilegeT is

function IsPresent (TheAdmin : T) return Boolean is

function IsDoingOp (TheAdmin : T) return Boolean is
(TheAdmin.CurrentOp in OpT);

Using type invariants to enforce global invariants

Some global properties on the version in SPARK 2005 were justified manually, like the global invariant maintained in package Auditlog over the global variables encoding the state of the files used to log operations: CurrentLogFile, NumberLogEntries, UsedLogFiles, LogFileEntries. Here is the text for this justification:

-- Proof Review file for

-- VC 6
-- C1:    fld_numberlogentries(state) = (fld_length(fld_usedlogfiles(state)) - 1)
--           * 1024 + element(fld_logfileentries(state), [fld_currentlogfile(state)
--           ]) .
-- C1 is a package state invariant.
-- proof shows that all public routines that modify NumberLogEntries, UsedLogFiles.Length,
-- CurrentLogFile or LogFileEntries(CurrentLogFile) maintain this invariant.
-- This invariant has not been propogated to the specification since it would unecessarily
-- complicate proof of compenents that use the facilities from this package.

We can do better in SPARK 2014, by expressing this property as a type invariant. This requires all four variables to become components of the same record type, so that a single global variable LogFiles replaces them:

   type LogFileStateT is record
CurrentLogFile   : LogFileIndexT  := 1;
NumberLogEntries : LogEntryCountT := 0;
UsedLogFiles     : LogFileListT   :=
LogFileListT'(List   => (others => 1),
LastI  => 1,
Length => 1);
LogFileEntries   : LogFileEntryT  := (others => 0);
end record
with Type_Invariant =>
Valid_NumberLogEntries
(CurrentLogFile, NumberLogEntries, UsedLogFiles, LogFileEntries);

LogFiles         : LogFilesT := LogFilesT'(others => File.NullFile)
with Part_Of => FileState;

With this change, all public subprograms updating the state of log files can now assume the invariant holds on entry (it is checked by GNATprove on every call) and must restore it on exit (it is checked by GNATprove when returning from the subprogram). Locally defined subprograms need not obey this constraint however, which is exactly what is needed here. One subtlety is that some of these local subprograms where accessing the state of log files as global variables. If we had kept LogFiles as a global variable, SPARK rules would have required that its invariant is checked on entry and exit from this subprograms. Instead, we changed the signature of these local subprograms to take LogFiles as an additional parameter, on which the invariant needs not hold.

Other transformations on contracts

A few other transformations were needed to make contracts provable with SPARK 2014. In particular, it was necessary to change a number of "and" logical operations into their short-circuit version "and then". See for example this part of the precondition of Processing in tismain.adb:

       (if (Admin.IsDoingOp(TheAdmin) and
then
Admin.RolePresent(TheAdmin) = PrivTypes.Guard)

The issue was that calling TheCurrentOp requires that IsDoingOp holds:

   function TheCurrentOp (TheAdmin : T) return OpT
with Global => null,
Pre    => IsDoingOp (TheAdmin);

Since "and" logical operation evaluates both its operands, TheCurrentOp will also be called in contexts where IsDoingOp does not hold, thus leading to a precondition failure. The fix is simply to use the short-circuit equivalent:

       (if (Admin.IsDoingOp(TheAdmin) and then
then
Admin.RolePresent(TheAdmin) = PrivTypes.Guard)

We also added a few loop invariants that were missing.

You can read the original Tokeneer report for a description of the security properties that were provably enforced through formal verification.

To demonstrate that indeed formal verification brings assurance that some security vulnerabilities are not present, we have seeded four vulnerabilities in the code, and reanalyzed it. The analysis of GNATprove (either through flow analysis or proof) detected all four: an information leak, a back door, a buffer overflow and an implementation flaw. You can see that in action in this short 4-minutes video.

]]>

This blog post is part two of a tutorial based on the OpenGLAda project and will cover implementation details such as a type system for interfacing with C, error handling, memory management, and loading functions.

If you haven't read part one I encourage you to do so. It can be found here

Wrapping Types

As part of the binding process we noted in the previous blog post that we will need to translate typedefs within the OpenGL C headers into Ada types so that our description of C functions that take arguments or return a value are accurate. Let’s begin with the basic numeric types:

with Interfaces.C;

package GL.Types is
type Int   is new Interfaces.C.int;      --  GLint
type UInt  is new Interfaces.C.unsigned; --  GLuint

subtype Size is Int range 0 .. Int'Last; --  GLsizei

type Single is new Interfaces.C.C_float; --  GLfloat
type Double is new Interfaces.C.double;  --  GLdouble
end GL.Types;

We use Single as a name for the single-precision floating point type to avoid confusion with Ada's Standard.Float. Moreover, we can apply Ada’s powerful numerical typing system in our definition of GLsize by defining it with a non-negative range. This affords us some extra compile-time and run-time checks without having to add any conditionals – something not possible in C.

The type list above is, of course, shortened for this post, however, two important types are explicitly declared elsewhere:

• GLenum, which is used for parameters that take a well-defined set of values specified within the #define directive in the OpenGL header. Since we want to make the Ada interface safe we will use real enumeration types for that.
• GLboolean, which is an unsigned char representing a boolean value. We do not want to have a custom boolean type in the Ada API because it will not add any value compared to using Ada's Boolean type (unlike e.g. the Int type, which may have a different range than Ada's Integer type).

For these types, we define another package called GL.Low_Level:

with Interfaces.C;

package GL.Low_Level is
type Bool is new Boolean;

subtype Enum is Interfaces.C.unsigned;
private
for Bool use (False => 0, True => 1);
for Bool'Size use Interfaces.C.unsigned_char'Size;
end GL.Low_Level;

We now have a Bool type that we can use for API imports and an Enum type that we will solely use to define the size of our enumeration types. Note that Bool also is an enumeration type, but uses the size of unsigned_char because that is what OpenGL defines for GLboolean.

To show how we can wrap GLenum into actual Ada enumeration types, lets examine glGetError which is defined like this in the C header:

GLenum glGetError(void);

The return value is one of several error codes defined as preprocessor macros in the header. We translate these into an Ada enumeration then wrap the subprogram resulting in the following:

package GL.Errors is
type Error_Code is
(No_Error, Invalid_Enum, Invalid_Value, Invalid_Operation,
Stack_Overflow, Stack_Underflow, Out_Of_Memory,
Invalid_Framebuffer_Operation);

function Error_Flag return Error_Code;
private
for Error_Code use
(No_Error                      => 0,
Invalid_Enum                  => 16#0500#,
Invalid_Value                 => 16#0501#,
Invalid_Operation             => 16#0502#,
Stack_Overflow                => 16#0503#,
Stack_Underflow               => 16#0504#,
Out_Of_Memory                 => 16#0505#,
Invalid_Framebuffer_Operation => 16#0506#);
for Error_Code'Size use Low_Level.Enum'Size;
end GL.Errors;

With the above code we encode the errors defined in the C header as representations for our enumeration values - this way, our safe enumeration type has the exact same memory layout as the defined error codes and maintains compatibility.

We then add the backend for Error_Flag as import to GL.API:

function Get_Error return Errors.Error_Code;
pragma Import (StdCall, Get_Error, "glGetError");

Error Handling

The OpenGL specification states that whenever an error arises while calling a function of the API, an internal error flag gets set. This flag can then be retrieved with the function glGetError we wrapped above.

It would certainly be nicer, though, if these API calls would raise Ada exceptions instead, but this would mean that in every wrapper to an OpenGL function that may set the error flag we'd need to call Get_Error, and, when the returned flag is something other than No_Error, we'd subsequently need to raise the appropriate exception. Depending on what the user does with the API, this may lead to significant overhead (let us not forget that OpenGL is much more performance-critical than it is safety-critical). In fact, more recent graphics API’s like Vulkan have debugging extensions which require manual tuning to receive error messages - in other words, due to overhead, Vulkan turns off all error checking by default.

So, what we will provide is a feature that auto-raises exceptions whenever the error flag is set, but make it optional. To achieve this, Ada exceptions derived from OpenGL’s error flags need to be defined.

Let’s add the following exception definitions to GL.Errors:

Invalid_Operation_Error             : exception;
Out_Of_Memory_Error                 : exception;
Invalid_Value_Error                 : exception;
Stack_Overflow_Error                : exception;
Stack_Underflow_Error               : exception;
Invalid_Framebuffer_Operation_Error : exception;
Internal_Error                      : exception;

Notice that the exceptions carry the same names as the corresponding enumeration values in the same package. This is not a problem because Ada is intelligent enough to know which one of the two we want depending on context. Also notice the exception Internal_Error which does not correspond to any OpenGL error – we'll see later what we need it for.

Next, we need a procedure that queries the error flag and possibly raises the appropriate exception. Since we will be using such a procedure almost everywhere in our wrapper let’s declare it in the private part of the GL package so that all of GL's child packages have access:

procedure Raise_Exception_On_OpenGL_Error;

And in the body:

procedure Raise_Exception_On_OpenGL_Error is separate;

Here, we tell Ada that this procedure is defined in a separate compilation unit enabling us to provide different implementations depending on whether the user wants automatic exception raising to be enabled or not. Before we continue though let’s set up our project with this in mind:

library project OpenGL is
--  Windowing_System config omitted

type Toggle_Type is ("enabled", "disabled");
Auto_Exceptions : Toggle_Type := external ("Auto_Exceptions", "enabled");

OpenGL_Sources := ("src");
case Auto_Exceptions is
when "enabled" =>
OpenGL_Sources := OpenGL_Sources & "src/auto_exceptions";
when "disabled" =>
OpenGL_Sources := OpenGL_Sources & "src/no_auto_exceptions";
end case;
for Source_Dirs use OpenGL_Sources;

--  packages and other things omitted
end OpenGL;

To conform with the modifications made to the project file we must now create two new directories inside the src folder and place the implementations of our procedure accordingly. GNAT expects the source files to both be named gl-raise_exception_on_openl_error.adb. The implementation of no_auto_exceptions is trivial:

separate (GL)
procedure Raise_Exception_On_OpenGL_Error is
begin
null;
end Raise_Exception_On_OpenGL_Error;


And the one in auto_exceptions looks like this:

with GL.Errors;

separate (GL)
procedure Raise_Exception_On_OpenGL_Error is
begin
case Errors.Error_Flag is
when Errors.Invalid_Operation             => raise Errors.Invalid_Operation_Error;
when Errors.Invalid_Value                 => raise Errors.Invalid_Value_Error;
when Errors.Invalid_Framebuffer_Operation => raise Errors.Invalid_Framebuffer_Operation_Error;
when Errors.Out_Of_Memory                 => raise Errors.Out_Of_Memory_Error;
when Errors.Stack_Overflow                => raise Errors.Stack_Overflow_Error;
when Errors.Stack_Underflow               => raise Errors.Stack_Underflow_Error;
when Errors.Invalid_Enum                  => raise Errors.Internal_Error;
when Errors.No_Error                      => null;
end case;
exception
when Constraint_Error => raise Errors.Internal_Error;
end Raise_Exception_On_OpenGL_Error;

The exception section at the end is used to detect cases where glGetError returns a value we did not know of at the time of implementing this wrapper. Ada would then try to map this value to the Error_Code enumeration, and since the value does not correspond to any value specified in the type definition, the program will raise a Constraint_Error. Of course, OpenGL is very conservative about adding error flags, so this is unlikely to happen, but it is still nice to plan for the future.

Types Fetching Function Pointers at Runtime

Part 1: Implementing the "Fetching" Function

As previously noted, many functions from the OpenGL API must be retrieved as a function pointer at run-time instead of linking to them at compile-time. The reason for this once again comes down to the concept of graceful degradation -- if some functionality exists as an extension (especially functions not part of the OpenGL core) but is unimplemented by a target graphics card driver then the programmer will be able to identify or recognize this case when setting the relevant function pointers during execution. Unfortunately though, this creates an extra step which prevents us from simply importing the whole of the API, and, worse still, on Windows no functions being defined on OpenGL 2.0 or later are available for compile time linking, making programmatic queries required.

So then, the question arises: how will these function pointers are to be retrieved? Sadly, this functionality is not available from within the OpenGL API or driver, but instead is provided by platform-specific extensions, or more specifically, the windowing system supporting OpenGL. So, as with exception handling, we will use a procedure with multiple implementations and switch to the appropriate implementation via GPRBuild:

case Windowing_System is
when "windows" => OpenGL_Sources := OpenGL_Sources & "src/windows";
when "x11"     => OpenGL_Sources := OpenGL_Sources & "src/x11";
when "quartz"  => OpenGL_Sources := OpenGL_Sources & "src/mac";
end case;

...and we declare this function in the main source:

function GL.API.Subprogram_Reference (Function_Name : String)
return System.Address;

Then finally, in the windowing-system specific folders, we place the implementation and necessary imports from the windowing system's API. Those imports and the subsequent implementations are not very interesting, so I will not discuss them at length here, but I will show you the implementation for Apple's Mac operating system to give you an idea:

with GL.API.Mac_OS_X;

function GL.API.Subprogram_Reference (Function_Name : String)

-- OSX-specific implementation uses CoreFoundation functions
use GL.API.Mac_OS_X;

package IFC renames Interfaces.C.Strings;

GL_Function_Name_C : IFC.chars_ptr := IFC.New_String (Function_Name);

Symbol_Name : constant CFStringRef :=
CFStringCreateWithCString
cStr     => GL_Function_Name_C,
encoding => kCFStringEncodingASCII);

CFBundleGetFunctionPointerForName
(bundle      => OpenGLFramework,
functionName => Symbol_Name);
begin
CFRelease (Symbol_Name);
IFC.Free (GL_Function_Name_C);
return Result;
end GL.API.Subprogram_Reference;

With the above code in effect, we are now able to retrieve the function pointers, however, we still need to implement the querying machinery to which there are three possible approaches:

• Lazy: When a feature is first needed, its corresponding function pointer is loaded and stored for future use. This approach to loading may produce the least amount of work needed to be done by the resulting application, although, theoretically, it makes performance of a call unpredictable. Since fetching function pointers is fairly trivial operation, however, this is not really a necessarily practical reason against this.
• Eager: At some defined point in time, a call gets issued to a loading function for every function pointer that is supported by OpenGLAda. The Eager approach produces the largest amount of work for the resulting application, but again, since loading is trivial it does not noticeably slow down the application (and, even if it did, it would so during initialization where it is most tolerable).
• Explicit: The user is required to specify which features they want to use and we only load the function pointers related to such features. Explicit loading places the heaviest burden on the user, since they must state which features they will be using.

Overall, the consequences of choosing one of these three possibilities are mild, so we will go with the one easiest to implement, which is the eager approach and is the same one used by many other popular OpenGL libraries.

Part 2: Autogenerating the Fetching Implementation

For each OpenGL function we import that must be loaded at runtime we need to create three things:

• The definition of an access type describing the function's parameters and return types.
• A global variable having this type to hold the function pointer as soon as it gets loaded.
• A call to a platform-specific function which will return the appropriate function pointer from a DLL or library for storage into our global function pointer.

Implementing these segments for each subprogram is a very repetitive task, which hints to the possibility of automating it. To check whether this is feasible, let’s go over the actual information we need to write in each of these code segments for an imported OpenGL function:

• The name of the C function we import

As you can see, this is almost exactly the same information we would need to write an imported subprogram loaded at compile time! To keep all information about imported OpenGL function centralized, let’s craft a simple specification format where we may list all this information for each subprogram.

Since we need to define Ada subprogram signatures, it seems a good idea to use Ada-like syntax (like GPRBuild does for its project files). After writing a small parser (I will not show details here since that is outside the scope of this post), we can now process a specification file looking like the following. We will discuss the package GL.Objects.Shaders and more about what it does in a bit.

with GL.Errors;
with GL.Types;

spec GL.API is
use GL.Types;

function Get_Error return Errors.Error_Code with Implicit => "glGetError";
procedure Flush with Implicit => "glFlush";

with
end GL.API;

This specification contains two imports we have already created manually and one new import – in this case we use Create_Shader as an example for a subprogram that needs to be loaded via function pointer. We use Ada 2012-like syntax for specifying the target link name with aspects and the import mode. There are two import modes:

• Implicit - meaning that the subprogram will be imported via pragmas. This will give us a subprogram declaration that will be bound to its implementation by the dynamic library loader. So it happens implicitly and we do not actually need to write any code for it. This is what we previously did in our import of glFlush in part one.
• Explicit - meaning that the subprogram will be provided as a function pointer variable. We will need to generate code that assigns a proper value to that variable at runtime in this case.

Processing this specification will generate us the following Ada subunits:

with GL.Errors;
with GL.Types;

private package GL.API is
use GL.Types;

pragma Convention (StdCall, T1);

function Get_Error return Errors.Error_Code;
pragma Import (StdCall, Get_Error, "glGetError");

procedure Flush;
pragma Import (StdCall, Flush, "glFlush");

end package GL.API;

--  ---------------

with System;
private with GL.API.Subprogram_Reference;
use GL.API;

generic
type Function_Reference is private;
function Load (Function_Name : String) return Function_Reference;

function Load (Function_Name : String) return Function_Reference is
function As_Function_Reference is
Target => Function_Reference);

Raw : System.Address := Subprogram_Reference (Function_Name);
begin
return As_Function_Reference (Raw);

begin
end GL.Load_Function_Pointers;

Notice how our implicit subprograms get imported like before, but for the explicit subprogram, a type T1 got created as an access type to the subprogram, and a global variable Create_Shader is defined to be of this type - satisfying all of our needs.

The procedure GL.Load_Function_Pointers contains the code to fill this variable with the right value by obtaining a function pointer using the platform-specific implementation discussed above. The generic load function exists so that additional function pointers can be loaded using this same code.

The only thing left to do is to expose this functionality in the public interface like the example below:

package GL is
--  ... other code

procedure Init;

--  ... other code
end GL;

--  ------

package body GL is
--  ... other code

--  ... other code
end GL;

Of course, we now require the user to explicitly call Init somewhere in their code... You might think that we could automatically execute the loading code at package initialization, but this would not work, because some OpenGL implementations (most prominently the one on Windows) will refuse to load any OpenGL function pointers unless there is a current OpenGL context. This context will only exist after we created an OpenGL surface to render on, which will be done programmatically by the user.

In practice, OpenGLAda includes a binding to the GLFW library as a platform-independent way of creating windows with an OpenGL surface on them, and this binding automatically calls Init whenever a window is made current (i.e. placed in foreground), so that the user does not actually need to worry about it. However, there may be other use-cases that do not employ GLFW, like, for example, creating an OpenGL surface widget with GtkAda. In that case, calling Init manually is still required given our design.

Memory Management

The OpenGL API enables us to create various objects that reside in GPU memory for things like textures or vertex buffers. Creating such objects gives us an ID (kind of like a memory address) which we can then use to refer to the object instead of a memory address. To avoid memory leaks, we will want to manage these IDs automatically in our Ada wrapper so they are automatically destroyed once the last reference vanishes. Ada’s Controlled types are an ideal candidate for the job. Let's start writing a package GL.Objects to encapsulate the functionality:

package GL.Objects is
use GL.Types;

type GL_Object is abstract tagged private;

procedure Initialize_Id (Object : in out GL_Object);

procedure Clear (Object : in out GL_Object);

function Initialized (Object : GL_Object) return Boolean;

procedure Internal_Create_Id
(Object : GL_Object; Id : out UInt) is abstract;

procedure Internal_Release_Id
(Object : GL_Object; Id : UInt) is abstract;
private
type GL_Object_Reference;
type GL_Object_Reference_Access is access all GL_Object_Reference;

type GL_Object_Reference is record
GL_Id           : UInt;
Reference_Count : Natural;
Is_Owner        : Boolean;
end record;

type GL_Object is abstract new Ada.Finalization.Controlled with record
Reference : GL_Object_Reference_Access := null;
end record;

-- Increases reference count.
overriding procedure Adjust (Object : in out GL_Object);

-- Decreases reference count. Destroys texture when it reaches zero.
overriding procedure Finalize (Object : in out GL_Object);
end GL.Objects;   

GL_Object is our smart pointer here, and GL_Object_Reference is the holder of the object's ID as well as the reference count. We will derive the actual object types (which there are quite a few) from GL_Object so that the base type can be abstract and we can define some subprograms that must be overridden by the child types to enforce the rule. Note that since the class hierarchy is based on GL_Object, all derived types have an identically-typed handle to a GL_Object_Reference object, and thus, our reference-counting is independent of the actual derived type.

The only thing the derived type must declare in order for our automatic memory management to work is how to create and delete the OpenGL object in GPU memory – this is what Internal_Create_Id and Internal_Release_Id in the above segment are for. Because they are abstract, they must be put into the public part of the package even though they should never be called by the user directly.

The core of our smart pointer machinery will be implemented in the Adjust and Finalize procedures provided by Ada.Finalization.Controlled. Since this topic has already been extensively covered in this Ada Gem I am going to skip over the gory implementation details.

So, to create a new OpenGL object the user must call Initialize_Id on a smart pointer which assigns the ID of the newly created object to the smart pointer's backing object. Clear can then later be used to make the smart pointer uninitialized again (but only delete the object if the reference count reaches zero).

To test our system, let's implement a Shader object. Shader objects will hold source code and compiled binaries of GLSL (GL Shading Language) shaders. We will call this package GL.Objects.Shaders in keeping with the rest of the project's structure:

package GL.Objects.Shaders is
pragma Preelaborate;

procedure Set_Source (Subject : Shader; Source : String);

function Compile_Status (Subject : Shader) return Boolean;

function Info_Log (Subject : Shader) return String;

private

overriding
procedure Internal_Create_Id (Object : Shader; Id : out UInt);

overriding
procedure Internal_Release_Id (Object : Shader; Id : UInt);

end GL.Objects.Shaders;

The two overriding procedures are implemented like this:

overriding
procedure Internal_Create_Id (Object : Shader; Id : out UInt) is
begin
Raise_Exception_On_OpenGL_Error;
end Internal_Create_Id;

overriding
procedure Internal_Release_Id (Object : Shader; Id : UInt) is
pragma Unreferenced (Object);
begin
Raise_Exception_On_OpenGL_Error;
end Internal_Release_Id;

Of course, we need to add the subprogram Delete_Shader to our import specification so it will be available in the generated GL.API package. A nice thing is that, in Ada, pointer dereference is often done implicitly so we need not worry whether Create_Shader and Delete_Shader are loaded via function pointers or with the dynamic library loader – the code would look exactly the same in both cases!

Documentation

One problem we did not yet address is documentation. After all, because we are adding structure and complexity to the OpenGL API, which does not exist in its specification, how is a user supposed to find the wrapper of a certain OpenGL function they want to use?

What we need to do, then, is generate a list where the name of each OpenGL function we wrap is listed and linked to its respective wrapper function in OpenGLAda's API. Of course, we do not want to generate that list manually. Instead, let’s use our import specification again and enrich it with additional information:

   function Get_Error return Errors.Error_Code with
Implicit => "glGetError",  Wrapper => "GL.Errors.Error_Flag";
procedure Flush with
Implicit => "glFlush", Wrapper => "GL.Flush";

With the new "aspect-like" declarations in our template we can enhance our generator with code that writes a Markdown file listing all imported OpenGL functions and linking that to their wrappers. In theory, we could even avoid adding the wrapper information explicitly by analyzing OpenGLAda's code to detect which subprogram wraps the OpenGL function. Tools like ASIS and LibAdaLang would help us with that, but that implementation would be far more work than adding our wrapper references explicitly.

The generated list can be seen on OpenGLAda's website showing all the functions that are actually supported. It is intended to be navigated via search (a.k.a. Ctrl+F).

Conclusion

By breaking down the complexities of a large C API like OpenGL, we have gone through quite a few improvements that can be done when creating an Ada binding. Some of them were not so obvious and probably not necessary for classifying a binding as thick - for example, auto-loading our function pointers at run-time was simply an artifact of supporting OpenGL and not covered inside the scope of the OpenGL API itself.

We also discovered that when wrapping a C API in Ada we must lift the interface to a higher level since Ada is indeed designed to be a higher-level language than C, and, in this vein, it was natural to add features that are not part of the original API to make it fit more at home in an Ada context.

It might be tempting to write a thin wrapper for your Ada project to avoid overhead, but beware - you will probably still end up writing a thick wrapper. After all, the code around calls that facilitates thinly wrapped functions and the need for data conversions does not simply vanish!

Of course, all this is a lot of work! To give you some numbers: The OpenGLAda repository contains 15,874 lines of Ada code (excluding blanks and comments, tests, and examples) while, for comparison, the C header gl.h (while missing many key features) is only around 3,000 lines.

]]>
For All Properties, There Exists a Proof https://blog.adacore.com/for-all-properties-there-exists-a-proof Mon, 19 Feb 2018 10:15:00 +0000 Yannick Moy https://blog.adacore.com/for-all-properties-there-exists-a-proof

With the recent addition of a Manual Proof capability in SPARK 18, it is worth looking at an example which cannot be proved by automatic provers, to see the options that are available for proving it with SPARK. The following code is such an example, where the postcondition of Do_Nothing cannot be proved with provers CVC4 or Z3, although it is exactly the same as its precondition:

   subtype Index is Integer range 1 .. 10;
type T1 is array (Index) of Integer;
type T2 is array (Index) of T1;

procedure Do_Nothing (Tab : T2) with
Ghost,
Pre  => (for all X in Index => (for some Y in Index => Tab(X)(Y) = X + Y)),
Post => (for all X in Index => (for some Y in Index => Tab(X)(Y) = X + Y));

procedure Do_Nothing (Tab : T2) is null;

The issue is that SMT provers that we use in SPARK like CVC4 and Z3 do not recognize the similarity between the property assumed here (the precondition) and the property to prove (the postcondition). To such a prover, the formula to prove (the Verification Condition or VC) looks like the following in SMTLIB2 format:

(declare-sort integer 0)
(declare-fun to_rep (integer) Int)
(declare-const tab (Array Int (Array Int integer)))
(assert
(forall ((x Int))
(=> (and (<= 1 x) (<= x 10))
(exists ((y Int))
(and (and (<= 1 y) (<= y 10))
(= (to_rep (select (select tab x) y)) (+ x y)))))))
(declare-const x Int)
(assert (<= 1 x))
(assert (<= x 10))
(assert
(forall ((y Int))
(=> (and (<= 1 y) (<= y 10))
(not (= (to_rep (select (select tab x) y)) (+ x y))))))
(check-sat)

We see here some of the encoding from SPARK programming language to SMTLIB2 format: the standard integer type Integer is translated into an abstract type integer, with a suitable projection to_rep from this abstract type to the standard Int type of mathematical integers in SMTLIB2; the array types T1 and T2 are translated into SMTLIB2 Array types. The precondition, which is assumed here, is directly transformed into a universally quantified axiom (starting with "forall"), while the postcondition is negated and joined with the other hypotheses, as an SMT solver will try to deduce an inconsistency to prove the goal by contradiction. So the negated postcondition becomes:

   (for some X in Index => (for all Y in Index => not (Tab(X)(Y) = X + Y)));

The existentially quantified variable X becomes a constant x in the VC, with assertions stating its bounds 1 and 10, and the universal quantification becomes another axiom.

Now it is useful to understand how SMT solvers deal with universally quantified axioms. Obviously, they cannot "try out" every possible value of parameters. Here, the quantified variable ranges over all mathematical integers! And in general, we may quantify over values of abstract types which cannot be enumerated. Instead, SMT solvers find suitable "candidates" for instantiating the axioms. The main technique to find such candidates is called trigger-based instantiation. The SMT solver identifies terms in the quantified axiom that contain the quantified variables, and match them with the so-called "ground" terms in the VC (terms that do not contain quantified or "bound" variables). Here, such a term containing x in the first axiom is (to_rep (select (select tab x) y)), or simply (select tab x), while in the second axiom such a term containing y could be (to_rep (select (select tab x) y)) or (select (select tab x) y). The issue with the VC above is that these do not match any ground term, hence neither CVC4 nor Z3 can prove the VC.

Note that Alt-Ergo is able to prove the VC, using the exact same trigger-based mechanism, because it considers (select tab x) from the second axiom as a ground term in matching. Alt-Ergo uses this term to instantiate the first axiom, which in turn provides the term (select (select tab x) sko_y) [where sko_y is a fresh variable corresponding to the skolemisation of the existentially quantified variable y]. Alt-Ergo then uses this new term to instantiate the second axiom, resulting in a contradiction. So Alt-Ergo can deduce that the VC is unsatisfiable, hence proves the original (non-negated) postcondition.

I am going to consider in the following alternative means to prove such a property, when all SMT provers provided with SPARK fail.

solution 1 - use an alternative automatic prover

As the property to prove is an exact duplication of a known property in hypothesis, a different kind of provers, called provers by resolution, is a perfect fit. Here, I'm using E prover, but many others are supported by the Why3 platform used in SPARK, and would be as effective. The first step is to install E prover from its website (www.eprover.org) or from its integration in your Linux distro. Then, you need to run the executable why3config to generate a suitable .why3.conf configuration file in your HOME directory, with the necessary information for Why3 to know how to generate VCs for E prover, and how to call it. Currently, GNATprove cannot be called with --prover=eprover, so instead I called directly the underlying Why3 tool and it proves the desired postcondition:

$why3 prove -L /path/to/theories -P Eprover quantarrays.mlw quantarrays.mlw Quantarrays__subprogram_def WP_parameter def : Valid (0.02s) solution 2 - prove interactively With SPARK 18 comes the possibility to prove a VC interactively inside the editor GPS. Just right-click on the message about the unproved postcondition and select "Start Manual Proof". Various panels are opened in GPS: Here, the manual proof is really simple. We start by applying axiom H, as the conclusion of this axiom matches the goal to prove, which makes it necessary to prove the conditions for applying axiom H. Then we use the known bounds on X in axioms H1 and H2 to prove the conditions. And we're done! The following snapshot shows that GPS now confirms that the VC has been proved: Note that it is possible to call an automatic prover by its name, like "altergo", "cvc4", or "z3" to prove the VC automatically after the initial application of axiom H. solution 3 - use an alternative interactive prover It is also possible to use powerful external interactive provers like Coq or Isabelle. You first need to install these on your machine. GNATprove and GPS are directly integrated with Coq, so that you can right-click on the unproved postcondition, select "Prove Check", then manually enter the switch "--prover=coq" to select Coq prover. GPS will then open CoqIDE on the VC as follows: The proof in Coq is as simple as before. Here is the exact set of tactics to apply to reproduce what we did with manual proof in GPS: Note that the tactic "auto" in Coq proves this VC automatically. What to Remember There are many ways forward that are available when automatic provers available with GNATprove fail to prove a property. We already presented in various occasions the use of ghost code. Here we described three other ways: using an alternative automatic prover, proving interactively, and using an alternative interactive prover. [cover image of Kurt Gödel, courtesy of WikiPedia, who demonstrated in fact that no all true properties can be ever proved] ]]> Bitcoin blockchain in Ada: Lady Ada meets Satoshi Nakamoto https://blog.adacore.com/bitcoin-in-ada Thu, 15 Feb 2018 13:00:00 +0000 Johannes Kanig https://blog.adacore.com/bitcoin-in-ada Bitcoin is getting a lot of press recently, but let's be honest, that's mostly because a single bitcoin worth 800 USD in January 2017 was worth almost 20,000 USD in December 2017. However, bitcoin and its underlying blockchain are beautiful technologies that are worth a closer look. Let’s take that look with our Ada hat on! So what's the blockchain? “Blockchain” is a general term for a database that’s maintained in a distributed way and is protected against manipulation of the entries; Bitcoin is the first application of the blockchain technology, using it to track transactions of “coins”, which are also called Bitcoins. Conceptually, the Bitcoin blockchain is just a list of transactions. Bitcoin transactions in full generality are quite complex, but as a first approximation, one can think of a transaction as a triple (sender, recipient, amount), so that an initial mental model of the blockchain could look like this: SenderRecipientAmount <Bitcoin address><Bitcoin address>0.003 BTC <Bitcoin address><Bitcoin address>0.032 BTC ......... Other data, such as how many Bitcoins you have, are derived from this simple transaction log and not explicitly stored in the blockchain. Modifying or corrupting this transaction log would allow attackers to appear to have more Bitcoins than they really have, or, allow them to spend money then erase the transaction and spend the same money again. This is why it’s important to protect against manipulation of that database. The list of transactions is not a flat list. Instead, transactions are grouped into blocks. The blockchain is a list of blocks, where each block has a link to the previous block, so that a block represents the full blockchain up to that point in time: Thinking as a programmer, this could be implemented using a linked list where each block header contains a prev pointer. The blockchain is grown by adding new blocks to the end, with each new block pointing to the former previous block, so it makes more sense to use a prev pointer instead of a next pointer. In a regular linked list, prev pointer points directly to the memory used for the previous block. But the uniqueness of the blockchain is that it's a distributed data structure; it's maintained by a network of computers or nodes. Every bitcoin full node has a full copy of the blockchain, but what happens if members of the network don't agree on the contents of some transaction or block? A simple memory corruption or malicious act could result in a client having incorrect data. This is why the blockchain has various checks built-in that guarantee that corruption or manipulation can be detected. How does Bitcoin check data integrity? Bitcoin’s internal checks are based on a cryptographic hash function. This is just a fancy name for a function that takes anything as input and spits out a large number as output, with the following properties: • The output of the function varies greatly and unpredictably even with tiny variations of the input; • It is extremely hard to deduce an input that produces some specific output number, other than by using brute force; that is, by computing the function again and again for a large number of inputs until one finds the input that produces the desired output. The hash function used in Bitcoin is called SHA256. It produces a 256-bit number as output, usually represented as 64 hexadecimal digits. Collisions (different input data that produces the same output hash value) are theoretically possible, but the output space is so big that collisions on actual data are considered extremely unlikely, in fact practically impossible. The idea behind the first check of Bitcoin's data integrity is to replace a raw pointer to a memory region with a “safe pointer” that can, by construction, only point to data that hasn’t been tampered with. The trick is to use the hash value of the data in the block as the “pointer” to the data. So instead of a raw pointer, one stores the hash of the previous block as prev pointer: Here, I’ve abbreviated the 256-bit hash values by their first two and last four hex digits – by design, Bitcoin block hashes always start with a certain number of leading zeroes. The first block contains a "null pointer" in the form of an all zero hash. Given a hash value, it is infeasible to compute the data associated with it, so one can't really "follow" a hash like one can follow a pointer to get to the real data. Therefore, some sort of table is needed to store the data associated with the hash value. Now what have we gained? The structure can no longer easily be modified. If someone modifies any block, its hash value changes, and all existing pointers to it are invalidated (because they contain the wrong hash value). If, for example, the following block is updated to contain the new prev pointer (i.e., hash), its own hash value changes as well. The end result is that the whole data structure needs to be completely rewritten even for small changes (following prev pointers in reverse order starting from the change). In fact such a rewrite never occurs in Bitcoin, so one ends up with an immutable chain of blocks. However, one needs to check (for example when receiving blocks from another node in the network) that the block pointed to really has the expected hash. Block data structure in Ada To make the above explanations more concrete, let's look at some Ada code (you may also want to have bitcoin documentation available). A bitcoin block is composed of the actual block contents (the list of transactions of the block) and a block header. The entire type definition of the block looks like this (you can find all code in this post plus some supporting code in this github repository):  type Block_Header is record Version : Uint_32; Prev_Block : Uint_256; Merkle_Root : Uint_256; Timestamp : Uint_32; Bits : Uint_32; Nonce : Uint_32; end record; type Transaction_Array is array (Integer range <>) of Uint_256; type Block_Type (Num_Transactions : Integer) is record Header : Block_Header; Transactions : Transaction_Array (1 .. Num_Transactions); end record; As discussed, a block is simply the list of transactions plus the block header which contains additional information. With respect to the fields for the block header, for this blog post you only need to understand two fields: • Prev_Block a 256-bit hash value for the previous block (this is the prev pointer I mentioned before) • Merkle_Root a 256-bit hash value which summarizes the contents of the block and guarantees that when the contents change, the block header changes as well. I will explain how it is computed later in this post. The only piece of information that’s missing is that Bitcoin usually uses the SHA256 hash function twice to compute a hash. So instead of just computing SHA256(data), usually SHA256(SHA256(data)) is computed. One can write such a double hash function in Ada as follows, using the GNAT.SHA256 library and String as a type for a data buffer (we assume a little-endian architecture throughout the document, but you can use the GNAT compiler’s Scalar_Storage_Order feature to make this code portable): with GNAT.SHA256; use GNAT.SHA256; function Double_Hash (S : String) return Uint_256 is D : Binary_Message_Digest := Digest (S); T : String (1 .. 32); for T'Address use D'Address; D2 : constant Binary_Message_Digest := Digest (T); function To_Uint_256 is new Ada.Unchecked_Conversion (Source => Binary_Message_Digest, Target => Uint_256); begin return To_Uint_256 (D2); end Double_Hash;  The hash of a block is simply the hash of its block header. This can be expressed in Ada as follows (assuming that the size in bits of the block header, Block_Header’Size in Ada, is a multiple of 8):  function Block_Hash (B : Block_Type) return Uint_256 is S : String (1 .. Block_Header'Size / 8); for S'Address use B.Header'Address; begin return Double_Hash (S); end Block_Hash; Now we have everything we need to check the integrity of the outermost layer of the blockchain. We simply iterate over all blocks and check that the previous block indeed has the hash used to point to it: declare Cur : String := "00000000000000000044e859a307b60d66ae586528fcc6d4df8a7c3eff132456"; S : String (1 ..64); begin loop declare B : constant Block_Type := Get_Block (Cur); begin S := Uint_256_Hex (Block_Hash (B)); Put_Line ("checking block hash = " & S); if not (Same_Hash (S,Cur)) then Ada.Text_IO.Put_Line ("found block hash mismatch"); end if; Cur := Uint_256_Hex (B.Prev_Block); end; end loop; end; A few explanations: the Cur string contains the hash of the current block as a hexadecimal string. At each iteration, we fetch the block with this hash (details in the next paragraph) and compute the actual hash of the block using the Block_Hash function. If everything matches, we set Cur to the contents of the Prev_Block field. Uint_256_Hex is the function to convert a hash value in memory to its hexadecimal representation for display. One last step is to get the actual blockchain data. The size of the blockchain is now 150GB and counting, so this is actually not so straightforward! For this blog post, I added 12 blocks in JSON format to the github repository, making it self-contained. The Get_Block function reads a file with the same name as the block hash to obtain the data, starting at a hardcoded block with the hash mentioned in the code. If you want to verify the whole blockchain using the above code, you have to either query the data using some website such as blockchain.info, or download the blockchain on your computer, for example using the Bitcoin Core client, and update Get_Block accordingly. How to compute the Merkle Root Hash So far, we were able to verify the proper chaining of the blockchain, but what about the contents of the block? The objective is now to come up with the Merkle root hash mentioned earlier, which is supposed to "summarize" the block contents: that is, it should change for any slight change of the input. First, each transaction is again identified by its hash, similar to how blocks are identified. So now we need to compute a single hash value from the list of hashes for the transactions of the block. Bitcoin uses a hash function which combines two hashes into a single hash:  function SHA256Pair (U1, U2 : Uint_256) return Uint_256 is type A is array (1 .. 2) of Uint_256; X : A := (U1, U2); S : String (1 .. X'Size / 8); for S'Address use X'Address; begin return Double_Hash (S); end SHA256Pair;  Basically, the two numbers are put side-by-side in memory and the result is hashed using the double hash function. Now we could just iterate over the list of transaction hashes, using this combining function to come up with a single value. But it turns out Bitcoin does it a bit differently; hashes are combined using a scheme that's called a Merkle tree: One can imagine the transactions (T1 to T6 in the example) be stored at the leaves of a binary tree, where each inner node carries a hash which is the combination of the two child hashes. For example, H7 is computed from H1 and H2. The root node carries the "Merkle root hash", which in this way summarizes all transactions. However, this image of a tree is just that - an image to show the order of hash computations that need to be done to compute the Merkle root hash. There is no actual tree stored in memory. There is one peculiarity in the way Bitcoin computes the Merkle hash: when a row has an odd number of elements, the last element is combined with itself to compute the parent hash. You can see this in the picture, where H9 is used twice to compute H11. The Ada code for this is quite straightforward:  function Merkle_Computation (Tx : Transaction_Array) return Uint_256 is Max : Integer := (if Tx'Length rem 2 = 0 then Tx'Length else Tx'Length + 1); Copy : Transaction_Array (1 .. Max); begin if Tx'Length = 1 then return Tx (Tx'First); end if; if Tx'Length = 0 then raise Program_Error; end if; Copy (1 .. Tx'Length) := Tx; if (Max /= Tx'Length) then Copy (Max) := Tx (Tx'Last); end if; loop for I in 1 .. Max / 2 loop Copy (I) := SHA256Pair (Copy (2 * I - 1), Copy (2 *I )); end loop; if Max = 2 then return Copy (1); end if; Max := Max / 2; if Max rem 2 /= 0 then Copy (Max + 1) := Copy (Max); Max := Max + 1; end if; end loop; end Merkle_Computation;  Note that despite the name, the input array only contains transaction hashes and not actual transactions. A copy of the input array is created at the beginning; after each iteration of the loop in the code, it contains one level of the Merkle tree. Both before and inside the loop, if statements check for the edge case of combining an odd number of hashes at a given level. We can now update our checking code to also check for the correctness of the Merkle root hash for each checked block. You can check out the whole code from this repository; the branch “blogpost_1” will stay there to point to the code as shown here. Why does Bitcoin compute the hash of the transactions in this way? Because it allows for a more efficient way to prove to someone that a certain transaction is in the blockchain. Suppose you want to show someone that you sent her the required amount of Bitcoin to buy some product. The person could, of course, download the entire block you indicate and check for themselves, but that’s inefficient. Instead, you could present them with the chain of hashes that leads to the root hash of the block. If the transaction hashes were combined linearly, you would still have to show them the entire list of transactions that come after yours in the block. But with the Merkle hash, you can present them with a “Merkle proof”: that is, just the hashes required to compute the path from your transaction to the Merkle root. In your example, if your transaction is T3, it's enough to also provide H4, H7 and H11: the other person can compute the Merkle root hash from that and compare it with the “official” Merkle root hash of that block. When I first saw this explanation, I was puzzled why an attacker couldn’t modify transaction T3 to T3b and then “invent” the hashes H4b, H7b and H11b so that the Merkle root hash H12 is unchanged. But the cryptographic nature of the hash function prevents this: today, there is no known attack against the hash function SHA256 used in Bitcoin that would allow inventing such input values (but for the weaker hash function SHA1 such collisions have been found). Wrap-Up In this blog post I have shown Ada code that can be used to verify the data integrity of blocks from the Bitcoin blockchain. I was able to check the block and Merkle root hashes for all the blocks in the blockchain in a few hours on my computer, though most of the time was spent in Input/Output to read the data in. There are many more rules that make a block valid, most of them related to transactions. I hope to cover some of them in later blog posts. ]]> The Road to a Thick OpenGL Binding for Ada: Part 1 https://blog.adacore.com/the-road-to-a-thick-opengl-binding-for-ada Mon, 05 Feb 2018 15:59:00 +0000 Felix Krause https://blog.adacore.com/the-road-to-a-thick-opengl-binding-for-ada This blog post is part one of a tutorial based on the OpenGLAda project and will cover some the background of the OpenGL API and the basic steps involved in importing platform-dependent C functions. Motivation Ada was designed by its onset in the late 70’s to be highly compatible with other languages - for example, there are currently native facilities for directly using libraries from C, FORTRAN, COBOL, C++, and even Java. However, there is still a process (although automate-able to a certain extent) that must be followed to safely and effectively import an API or create what we will refer to here as a binding. Additionally, foreign APIs may not be the most efficient or user-friendly for direct use in Ada, and so it is often considered useful to go above and beyond making a simple or thin binding and instead craft a small custom library (or thick binding) above the original API to solidify and greatly simplify its use within the Ada language. In this blog post I will describe the design decisions and architecture of OpenGLAda - a custom thick binding to the OpenGL API for Ada, and, in the process, I hope to provide ideas and techniques that may inspire others to contribute their own bindings for similar libraries. Below are some examples based on the classic OpenGL Superbible of what is possible using the OpenGLAda binding and whose complete source can be found on my Github repo for OpenGLAda here along with instructions for setting up an Ada environment: Background OpenGL, created in 1991 by Silicon Graphics, has had a long history as an industry standard for rendering 3D vector graphics - growing through numerous revisions (currently at version 4.6) both adding new features and deprecating or removing others. As a result, the once simple API has become more complex and difficult to wield at times. Despite this and even with the competition of Microsoft’s DirectX and the creation of new APIs (like Vulkan), OpenGL still remains a big player in the Linux, Mac, and free-software world. Unlike a typical C library, OpenGL has hundreds (maybe even thousands) of implementations, usually provided by graphics hardware vendors. While the OpenGL API itself is considered platform-independent, making use of it does depends heavily on the target platform's graphics and windowing systems. This is due to the fact that rendering requires a so-called OpenGL context consisting of a drawing area on the screen and all associated data needed for rendering. For this reason, there exist multiple glue APIs that enable using OpenGL in conjunction with several windowing systems. Design Challenges A concept that proliferates the design of OpenGL is graceful degradation - meaning that if some feature or function is unavailable on a target platform the client software may supply a workaround or simply skip the part of the rendering process in which the feature is required. This makes it necessary to query for existing features during run-time. Additionally, the code for querying OpenGL features is not part of the OpenGL API itself and must be provided by us and defined separately for each platform we plan to support. These properties pose the following challenges for our Ada binding: 1. It must include some platform-dependent code, ideally hiding this from the user to enable platform-independent usage. 2. It must access OpenGL features without directly linking to them so that missing features can be handled inside the application. First Steps Note: I started working on OpenGLAda in 2012 so it only uses the features of the Ada 2005 language level. Some code shown here could be written in a more succinct way with the added constructs in Ada 2012 (most notably aspects and expression functions). To get started on our binding we need to translate subprogram definitions from the standard OpenGL C header into Ada. Since we are writing a thick binding and are going above and beyond directly using the original C function, these API imports should be invisible to the user. Thus, we will define a set of private packages such as GL.API to house all of these imports. A private package can only be used by the immediate parent package and its children, making it invisible for a user of the library. The public package GL and its public child packages will provide the public interface. To translate a C subprogram declaration to Ada, we need to map all C types it uses into equivalent Ada types then essentially change the syntax from C to Ada. For the first import, we choose the following subprogram: void glFlush(); This is a command used to tell OpenGL to execute commands currently stored in internal buffers. It is a very common command and thus is placed directly in the top-level package of the public interface. Since the command has no parameters and returns no values, there are no types involved so we don’t need to care about them for now. Our Ada code looks like this: package GL is procedure Flush; end GL; private package GL.API is procedure Flush; pragma Import (Convention => C, Entity => Flush, External_Name => "glFlush"); end GL.API; package body GL is procedure Flush is begin API.Flush; end Flush; end GL; Instead of providing an implementation of GL.API.Flush in a package body, we use the pragma Import to tell the Ada compiler that we are importing this subprogram from another library. The first parameter is the calling convention, which defines low-level details about how a subprogram call is to be translated into machine code. It is vital that the caller and the callee agree on the same calling convention; a mistake at this point is hard to detect and, in the worst case, may lead to memory corruption during run-time. Note that when defining the implementation of the public subprogram GL.Flush, we cannot use a renames clause like we typically would, because our imported backend subprogram is within a private package. Now, the interesting part: how do we link to the appropriate OpenGL implementation according to the system we are targeting? Not only are there multiple implementations, but their link names also differ. The solution is to use the GPRBuild tool and define a scenario variable to select the correct linker flags: library project OpenGL is type Windowing_System_Type is ("windows", -- Microsoft Windows "x11", -- X Window System (primarily used on Linux) "quartz"); -- Quartz Compositor (the macOS window manager) Windowing_System : Windowing_System_Type := external ("Windowing_System"); for Languages use ("ada"); for Library_Name use "OpenGLAda"; for Source_Dirs use ("src"); package Compiler is for Default_Switches ("ada") use ("-gnat05"); end Compiler; package Linker is case Windowing_System is when "windows" => for Linker_Options use ("-lOpenGL32"); when "x11" => for Linker_Options use ("-lGL"); when "quartz" => for Linker_Options use ("-Wl,-framework,OpenGL"); end case; end Linker; end OpenGL; We will need other distinctions based on the windowing system later, and thus we name the scenario variable Windowing_System accordingly, although, at this point, it would also be sensible to distinguish just the operating system instead. We use Linker_Options instead of Default_Switches in the linker to tell GPRBuild what options we need when linking the final executable. As you can see, the library we link against is called OpenGL32 on Windows and GL on Linux. On MacOS, there is the concept of frameworks which are somewhat more sophisticated software libraries. On the gcc command line, they can be given with "-framework <name>", which gcc hands over to the linker. However, this does not work easily with GPRBuild unless we use the "-Wl,option" flag, whose operation is defined as: Pass "option" as an option to the linker. If option contains commas, it is split into multiple options at the commas. You can use this syntax to pass an argument to the option. At this point, we have almost successfully wrapped our first OpenGL subprogram. However, there is a nasty little detail we overlooked: Windows APIs use a calling convention different than the standard C one. One usually only needs to care about this when linking against the Win32 API, however, OpenGL is thought to be part of the Windows API as we can see in the OpenGL C header: GLAPI void APIENTRY glFlush (void); ... and by digging through the Windows version of this header we then find somewhere, wrapped in some #ifdef's, this line: #define APIENTRY __stdcall This means our target C function has the calling convention stdcall, which is only used on Windows. Thankfully, GNAT supports this calling convention, and moreover, for every system that is not Windows defines it as synonym for the C calling convention. Thus, we can rewrite our import: procedure Flush; pragma Import (Convention => Stdcall, Entity => Flush, External_Name => "glFlush"); With the above code, our first wrapper subprogram is ready. Stay tuned for part two where we will cover a basic type system for interfacing with C, error handling, memory management, and more! Part two of this article can be found here! ]]> AdaCore at FOSDEM 2018 https://blog.adacore.com/adacore-at-fodsem-2018 Thu, 18 Jan 2018 14:55:02 +0000 Pierre-Marie de Rodat https://blog.adacore.com/adacore-at-fodsem-2018 Every year, free and open source enthusiasts gather at Brussels (Belgium) for two days of FLOSS-related conferences. FOSDEM organizers setup several “developer rooms”, which are venues that host talks on specific topics. This year, the event will happen on the 3rd and 4th of February (Saturday and Sunday) and there is a room dedicated to the Ada programming language. Just like last year and the year before, several AdaCore engineers will be there. We have five talks scheduled: In the Ada devroom: In the Embedded, mobile and automotive devroom: In the Source Code Analysis devroom: Note also in the Embedded, mobile and automotive devroom that the talk from Alexander Senier about the work they are doing at Componolit, which uses SPARK and Genode to bring trust to the Android platform. If you happen to be in the area, please come and say hi! ]]> Leveraging Ada Run-Time Checks with Fuzz Testing in AFL https://blog.adacore.com/running-american-fuzzy-lop-on-your-ada-code Tue, 19 Dec 2017 09:36:08 +0000 Lionel Matias https://blog.adacore.com/running-american-fuzzy-lop-on-your-ada-code Fuzzing is a very popular bug finding method. The concept, very simply, is to continuously inject random (garbage) data as input of a software component, and wait for it to crash. Google's Project Zero team made it one of their major vulnerability-finding tools (at Google scale). It is very efficient at robust-testing file format parsers, antivirus software, internet browsers, javascript interpreters, font face libraries, system calls, file systems, databases, web servers, DNS servers... When Heartbleed came out, people found out that it was, indeed, easy to find. Google even launched a free service to fuzz at scale widely used open-source libraries, and Microsoft created Project Springfield, a commercial service to fuzz your application (at scale also, in the cloud). Writing robustness tests can be tedious, and we - as developers - are usually bad at it. But when your application is on an open network, or just user-facing, or you have thousands of appliance in the wild, that might face problems (disk, network, cosmic rays :-)) that you won't see in your test lab, you might want to double-, triple-check that your parsers, deserializers, decoders are as robust as possible... In my experience, fuzzing causes so many unexpected crashes in so many software parts, that it's not unusual to spend more time doing crash triage than preparing a fuzzing session. It is a great verification tool to complement structured testing, static analysis and code review. Ada is pretty interesting for fuzzing, since all the runtime checks (the ones your compiler couldn't enforce statically) and all the defensive code you've added (through pre-/post-conditions, asserts, ...) can be leveraged as fuzzing targets. Here's a recipe to use American Fuzzy Lop on your Ada code. American Fuzzy Lop AFL is a fuzzer from Michał Zalewski (lcamtuf), from the Google security team. It has an impressive trophy case of bugs and vulnerabilities found in dozens of open-source libraries or tools. You can even see for yourself how efficient guided fuzzing is in the now classic 'pulling jpegs out of thin air' demonstration on the lcamtuf blog. I invite you to read the technical description of the tool to get a precise idea of the innards of AFL. Installation instructions are covered in the Quick Start Guide, and they can be summed as: 1. Get the latest source 2. Build it (make) 3. Make your program ready for fuzzing (here you have to work a bit) 4. Start fuzzing... There are two main parts of the tool: • afl-clang/afl-gcc to instrument your binary • and afl-fuzz that runs your binary and uses the instrumentation to guide the fuzzing session. Instrumentation afl-clang / afl-gcc compiles your code and adds a simple instrumentation around branch instructions. The instrumentation is similar to gcov or profiling instrumentation but it targets basic blocks. In the clang world, afl-clang-fast uses a plug-in to add the instrumentation cleanly (the compiler knows about all basic blocks, and it's very easy to add some code at the start of a basic block in clang). In the gcc world the tool only provides a hacky solution. The way it works is that instead of calling your GCC of predilection, you call afl-gcc. afl-gcc will then call your GCC to output the assembly code generated from your code. To simplify, afl-gcc patches every jump instruction and every label (jump destination) to append an instrumentation block. It then calls your assembler to finish the compilation job. Since it is a pass on assembly code generated from GCC it can be used to fuzz Ada code compiled with GNAT (since GNAT is based on GCC). In the gprbuild world this means calling gprbuild with the --compiler-subst=lang,tool option (see gprbuild manual). Note : afl-gcc will override compilation options to force -O3 -funroll-loops. The reason behind this is that the authors of AFL noticed that those optimization options helped with the coverage instrumentation (unrolling loops will add new 'jump' instructions). With some codebases there can appear a problem with the 'rep ret' instruction. For obscure reasons gcc sometimes insert a 'rep ret' instruction instead of a ‘ret’ (return) instruction. Some info on the gcc mailing list archives and in more detail if you dare on a dedicated website call repzret.org. When AFL inserts its instrumentation code, the 'rep ret' instruction is not correct anymore ('as' complains). Since 'rep ret' is exactly the same instruction (except a bit slower on some AMD arch) as ‘ret’, you can add a step in afl-as (the assembly patching module) to patch the (already patched) assembly code: add the following code at line 269 in afl-as.c (on 2.51b or 2.52b versions):  if (!strncmp(line, "\trep ret", 8)) { SAYF("[LMA patch] afl-as : replace 'rep ret' with (only) 'ret'\n"); fputs("\tret\n", outf); continue; } ... and then recompile AFL. It then works fine, and prints a specific message whenever it encounters the problematic case. I didn't need this workaround for the example programs I chose for this post (you probably won't need it), but it can happen, so here you go... Though a bit hacky, going through assembly and sed-patching it seems the only way to do this on gcc, for now. It's obviously not available on any other arch (power, arm) as such, as afl-as inserts an x86-specifc payload. Someone wrote a gcc plug-in once and it would need some love to be ported to a gcc-6 (recent GNAT) or -8 series (future GNAT). The plug-in approach would also allow to do in-process fuzzing, speed-up the fuzzing process, and ease the fuzzing of programs with a large initialization/set-up time. When you don't have the source code or changing your build chain would be too hard, the afl-fuzz manual mentions a Qemu-based option. I haven't tried it though. The test-case generator It takes a bunch of valid inputs to your application, and implements a wide variety of random mutations, runs your application with them and then uses the inserted instrumentation to guide itself to new code paths and avoid staying too much on paths that already crash. AFL looks for crashes. It is expecting a call to abort() (SIGABRT). Its job is to try and crash your software, its search target is "a new unique crash". It's not very common to get a core dump (SIGSEGV/SIGABRT) in Ada with GNAT, even following an uncaught top-level exception. You'll have to help the fuzzer and provoke core dumps on errors you want to catch. A top-level exception by itself won't do it. In the GNAT world you can dump core using the Core_Dump procedure in the GNAT-specific package GNAT.Exception_Actions. What I usually do is let all exceptions bubble up to a top-level exception handler and filter by name, and only crash/abort on the exceptions I'm interested in. And if the bug you're trying to find with fuzzing doesn't crash your application, make it a crashing bug. With all that said, let’s find some open-source libraries to fuzz. Fuzzing Zip-Ada Zip-Ada is a nice pure-Ada library to work with zip archives. It can open, extract, compress, decompress most of the possible kinds of zip files. It even has implemented recently LZMA compression. It's 100% Ada, portable, quite readable and simple to use (drop the source, use the gpr file, look up the examples and you're set). And it's quite efficient (say my own informal benchmarks). Anyway it's a cool project to contribute to but I'm no compression wizard. Instead, let's try and fuzz it. Since it's a library that can be given arbitrary files, maybe of dubious source, it needs to be robust. I got the source from sourceforge of version 52 (or if you prefer, on github), uncompressed it and found the gprbuild file. Conveniently Zip-Ada comes with a debug mode that enables all possible runtime checks from GNAT, including -gnatVa, -gnato. The zipada.gpr file also references a 'pragma file' (through -gnatec=debug.pra) that contains a 'pragma Initialize_Scalars;' directive so everything is OK on the build side. Then we need a very simple test program that takes a file name as a command-line argument, then drives the library from there. File parsers are the juiciest targets, so let's read and parse a file: we'll open and extract a zip file. For a first program what we're looking for is procedure Extract in the Unzip package: -- Extract all files from an archive (from) procedure Extract (From : String; Options : Option_set := No_Option; Password : String := ""; File_System_Routines : FS_Routines_Type := Null_Routines) Just give it a file name and it will (try to) parse it as an archive and extract all the files from the archive. We also need to give AFL what it needs (abort() / core dump) so let's add a top-level exception block that will do that, unconditionally (at first) on any exception. The example program looks like: with UnZip; use UnZip; with Ada.Command_Line; with GNAT.Exception_Actions; with Ada.Exceptions; with Ada.Text_IO; use Ada.Text_IO; procedure Test_Extract is begin Extract (From => Ada.Command_Line.Argument (1), Options => (Test_Only => True, others => False), Password => "", File_System_Routines => Null_routines); exception when Occurence : others => Put_Line ("exception occured [" & Ada.Exceptions.Exception_Name (Occurence) & "] [" & Ada.Exceptions.Exception_Message (Occurence) & "] [" & Ada.Exceptions.Exception_Information (Occurence) & "]"); GNAT.Exception_Actions.Core_Dump (Occurence); end Test_Extract; And to have it compile, we add it to the list of main programs in the zipada.gpr file. Then let's build: gprbuild --compiler-subst=Ada,/home/lionel/afl/afl-2.51b/afl-gcc -p -P zipada.gpr -Xmode=debug We get a classic gprbuild display, with some additional lines: ... afl-gcc -c -gnat05 -O2 -gnatp -gnatn -funroll-loops -fpeel-loops -funswitch-loops -ftracer -fweb -frename-registers -fpredictive-commoning -fgcse-after-reload -ftree-vectorize -fipa-cp-clone -ffunction-sections -gnatec../za_elim.pra zipada.adb afl-cc 2.51b by <lcamtuf@google.com> afl-as 2.51b by <lcamtuf@google.com> [+] Instrumented 434 locations (64-bit, non-hardened mode, ratio 100%). afl-gcc -c -gnat05 -O2 -gnatp -gnatn -funroll-loops -fpeel-loops -funswitch-loops -ftracer -fweb -frename-registers -fpredictive-commoning -fgcse-after-reload -ftree-vectorize -fipa-cp-clone -ffunction-sections -gnatec../za_elim.pra comp_zip.adb afl-cc 2.51b by <lcamtuf@google.com> afl-as 2.51b by <lcamtuf@google.com> [+] Instrumented 45 locations (64-bit, non-hardened mode, ratio 100%). ... The 2 additional afl-gcc and afl-as steps show up along with a counter of instrumented locations in the assembly code for each unit. So, some instrumentation was inserted. Fuzzers are bad with checksums (http://moyix.blogspot.fr/2016/07/fuzzing-with-afl-is-an-art.html is an interesting dive into what can block afl-fuzz and what can be done, and John Regehr had a blog post on what AFL is bad at). For example, there’s no way for a fuzzing tool to go through a checksum test: it would need to generate only test cases that have a matching checksum. So, to make sure we get somewhere, I removed all checksum tests. There was one for zip CRC. Another one for zip passwords, for similar reasons. After I commented out those tests, I recompiled the test program. Then we’ll need to build a fuzzing environnement: mkdir fuzzing-session mkdir fuzzing-session/input mkdir fuzzing-session/output We also need to bootstrap the fuzzer with an initial corpus that doesn't crash. If there's a test suite, put the correct files in input/. Then afl-fuzz can (finally) be launched: AFL_I_DONT_CARE_ABOUT_MISSING_CRASHES=ON \ /home/lionel/afl/afl-2.51b/afl-fuzz -m 1024 -i input -o output ../test_extract @@ -i dir - input directory with test cases -o dir - output directory for fuzzer findings -m megs - memory limit for child process (50 MB) @@ to tell afl to put the input file as a command line argument. By default afl will write to the program's stdin. The AFL_I_DONT_CARE_ABOUT_MISSING_CRASHES=ON prelude is to silence a warning from afl-fuzz about how your system handles core dumps (see the man page for core). For afl-fuzz it's a problem because whatever is done to handle core dumps on your system might take some time and afl-fuzz will think the program timed out (although it crashed). For you it can also be a problem : It’s possible (see : some linux distros) that it’d instruct your system to do something (send a UI notification, fill your /var/log/messages, send a crash report e-mail to your sysadmin, …) with core dumps automatically (and you might not care). Maybe check first with your sysadmin… If you’re root on your machine, follow afl-fuzz’s advice and change your /proc/sys/kernel/core_pattern to something sensitive. Let’s go: In less than 2 minutes, afl-fuzz finds several crashes. While it says they’re “unique”, they in fact trigger the same 2 or 3 exceptions. After 3 hours, it “converges” to a list of crashes, and letting it run for 3 days doesn’t bring another one. It got a string of CONSTRAINT_ERRORs: • CONSTRAINT_ERROR : unzip.adb:269 range check failed • CONSTRAINT_ERROR : zip.adb:535 range check failed • CONSTRAINT_ERROR : zip.adb:561 range check failed • CONSTRAINT_ERROR : zip-headers.adb:240 range check failed • CONSTRAINT_ERROR : unzip-decompress.adb:650 range check failed • CONSTRAINT_ERROR : unzip-decompress.adb:712 index check failed • CONSTRAINT_ERROR : unzip-decompress.adb:1384 access check failed • CONSTRAINT_ERROR : unzip-decompress.adb:1431 access check failed • CONSTRAINT_ERROR : unzip-decompress.adb:1648 access check failed I sent those and the reproducers to Gautier de Montmollin (Zip-Ada's maintainer). He corrected those quickly (revisions 587 up to 599). Most of those errors now are raised as Zip-Ada-specific exceptions. He also decided to rationalize the list of raised exceptions that could (for legitimate reasons) be raised from the Zip-Ada decoding code. It also got some ADA.IO_EXCEPTIONS.END_ERROR: • ADA.IO_EXCEPTIONS.END_ERROR : zip.adb:894 • ADA.IO_EXCEPTIONS.END_ERROR : s-ststop.adb:284 instantiated at s-ststop.adb:402 I redid another fuzzing session after all the corrections and improvements confirming the list of exceptions. This wasn’t a lot of work (for me), mostly using the cycles on my machine that I didn’t use, and I got a nice thanks for contributing :-). Fuzzing AdaYaml AdaYaml is a library to parse YAML files in Ada. Let’s start by cloning the github repository (the one before all the corrections). For those not familiar to git (here's a tutorial) : git clone https://github.com/yaml/AdaYaml.git git checkout 5616697b12696fd3dcb1fc01a453a592a125d6dd Then the source code of the version I tested should be in the AdaYaml folder. If you don't want anything to do with git, there's a feature on github to download a Zip archive of a version of a repository. AdaYaml will ask for a bit more work to fuzz: we need to create a simple example program, then add some compilation options to the GPR files (-gnatVa, -gnato) and to add a pragma configuration file to set pragma Initialize_Scalars. This last option, combined with -gnatVa helps surface accesses to uninitialized variables (if you don't know the option : https://gcc.gnu.org/onlinedocs/gcc-4.6.3/gnat_rm/Pragma-Initialize_005fScalars.html and http://www.adacore.com/uploads/technical-papers/rtchecks.pdf). All those options to make sure we catch the most problems possible with runtime checks. The example program looks like: with Utils; with Ada.Text_IO; with Ada.Command_Line; with GNAT.Exception_Actions; with Ada.Exceptions; with Yaml.Dom; with Yaml.Dom.Vectors; with Yaml.Dom.Loading; with Yaml.Dom.Dumping; with Yaml.Events.Queue; procedure Yaml_Test is S : constant String := Utils.File_Content (Ada.Command_Line.Argument (1)); begin Ada.Text_IO.Put_Line (S); declare V : constant Yaml.Dom.Vectors.Vector := Yaml.Dom.Loading.From_String (S); E : constant Yaml.Events.Queue.Reference := Yaml.Dom.Dumping.To_Event_Queue (V); pragma Unreferenced (E); begin null; end; exception when Occurence : others => Ada.Text_IO.Put_Line (Ada.Exceptions.Exception_Information (Occurence)); GNAT.Exception_Actions.Core_Dump (Occurence); end Yaml_Test; The program just reads a file and parses it, transforms it into a vector of DOM objects, then transforms those back to a list of events (see API docs). The YAML reference spec may help explain a bit what's going on here. Using the following diagram, and for those well-versed in YAML: • the V variable (of our test program) is a "Representation" generated via the Parse -> Compose path • the E variable is an "Event Tree" generated from V via "Serialize" (so, going back down to a lower-level representation from the DOM tree). For this specific fuzzing test, the idea is not to stop at the first stage of parsing but also to go a bit through the data that was decoded, and do something with it (here we stop short of a round-trip to text, we just go back to an Event Tree). Sometimes a parser faced with incoherent input will keep on going (fail silently) and won't fill (initialize) some fields. The GPR files to patch are yaml.gpr and the parser_tools.gpr subproject. The first fuzzing session triggers “expected” exceptions from the parser: • YAML.PARSER_ERROR • YAML.COMPOSER_ERROR • LEXER.LEXER_ERROR • YAML.STREAM_ERROR (as it turns out, this one is also unexpected... more on this one later) Which should happen with malformed input. So to get unexpected crashes and only those, let’s filter them in the top-level exception handler. exception when Occurence : others => declare N : constant String := Ada.Exceptions.Exception_Name (Occurence); begin Ada.Text_IO.Put_Line (Ada.Exceptions.Exception_Information (Occurence)); if N = "YAML.PARSER_ERROR" or else N = "LEXER.LEXER_ERROR" or else N = "YAML.STREAM_ERROR" or else N = "YAML.COMPOSER_ERROR" then null; else GNAT.Exception_Actions.Core_Dump (Occurence); end if; end Yaml_Test; Then, I recompiled, used some YAML example files as a startup corpus, and started fuzzing. After 4 minutes 30 seconds, the first crashes appeared. I let it run for hours, then a day and found a list of issues. I sent all of those and the reproducers to Felix Krause (maintainer of the AdaYaml project). He was quick to answer and analyse all the exceptions. Here are his comments: • ADA.STRINGS.UTF_ENCODING.ENCODING_ERROR : bad input at Item (1) I guess this happens when you use a unicode escape sequence that codifies a code point beyond the unicode range (0 .. 0x10ffff). Definitely an error and should raise a Lexer_Error instead. … and he created issue https://github.com/yaml/AdaYaml/issues/4 • CONSTRAINT_ERROR : text.adb:203 invalid data This hints to a serious error in my custom string allocator that can lead to memory corruption. I have to investigate to be able to tell what goes wrong here. … and then he found the problem: https://github.com/yaml/AdaYaml/issues/5 • CONSTRAINT_ERROR : Yaml.Dom.Mapping_Data.Node_Maps.Insert: attempt to insert key already in map This happens when you try to parse a YAML mapping that has two identical keys (this is conformant to the standard which disallows that). However, the error should be catched and a Compose_Error should be raised instead. … and he opened https://github.com/yaml/AdaYaml/issues/3 • CONSTRAINT_ERROR : yaml-lexer-evaluation.adb:283 overflow check failed • CONSTRAINT_ERROR : yaml-lexer-evaluation.adb:286 overflow check failed • CONSTRAINT_ERROR : yaml-lexer-evaluation.adb:289 overflow check failed This is, thankfully, an obvious error: Hex escape sequence in the input may have up to eight nibbles, so they represent a value range of 0 .. 2**32 - 1. I use, however, a Natural to store that value, which is a subtype of Integer, which is of platform-dependent range – in this case, it is probably 32-bit, but since it is signed, its range goes only up to 2**31 - 1. This would suffice in theory, since the largest unicode code point is 0x10ffff, but AdaYaml needs to catch cases that exceed this range. … and attached to https://github.com/yaml/AdaYaml/issues/4 • STORAGE_ERROR : stack overflow or erroneous memory access … and he created issue https://github.com/yaml/AdaYaml/issues/6 and changed the parsing mode of nested structures to avoid stack overflows (no more recursion). There were also some “hangs”: AFL monitors the execution time of every test case, and flags large timeouts as hangs, to be inspected separately from crashes. Felix took the examples with a long execution time, and found an issue with the hashing of nodes. With all those error cases, Felix created an issue that references all the individual issues, and corrected them. After all the corrections, Felix gave me a analysis of the usefulness of the test: Your findings mirror the test coverage of the AdaYaml modules pretty well: There was no bug in the parser, as this is the most well-tested module. One bug each was found in the lexer and the text memory management, as these modules do have high test coverage, but only because they are needed for the parser tests. And then three errors in the DOM code as this module is almost completely untested. After reading a first draft of this blog post, Felix noted that YAML.STREAM_ERROR was in fact an unexpected error in my test program. Also, you should not exclude Yaml.Stream_Error. This error means that a malformed event stream has been encountered. Parsing a YAML input stream or serializing a DOM structure should *always* create a valid event stream unless it raises an exception – hence getting Yaml.Stream_Error would actually show that there's an internal error in one of those components. [...] Yaml.Stream_Error would only be an error with external cause if you generate an event stream manually in your code. I filtered this exception because I'd encountered it in the test suite available in the AdaYaml github repository (it is in fact a copy of the reference yaml test-suite). I wanted to use the complete test suite as a starting corpus, but examples 8G76 and 98YD crashed and it prevented me from starting the fuzzing session, so instead of removing the crashing test cases, I filtered out the exception... The fact that 2 test cases from the YAML test suite make my simple program crash is interesting, but can we find more cases ? I removed those 2 files from the initial corpus, and I focused the small test program on finding cases that crash on a YAML.STREAM_ERROR: exception when Occurence : others => declare N : constant String := Ada.Exceptions.Exception_Name (Occurence); begin Ada.Text_IO.Put_Line (Ada.Exceptions.Exception_Information (Occurence)); if N = "YAML.STREAM_ERROR" then GNAT.Exception_Actions.Core_Dump (Occurence); end if; end Yaml_Test; In less than 5 minutes, AFL finds 5 categories of crashes: • raised YAML.STREAM_ERROR : Unexpected event (expected document end): ALIAS • raised YAML.STREAM_ERROR : Unexpected event (expected document end): MAPPING_START • raised YAML.STREAM_ERROR : Unexpected event (expected document end): SCALAR • raised YAML.STREAM_ERROR : Unexpected event (expected document end): SEQUENCE_START • raised YAML.STREAM_ERROR : Unexpected event (expected document start): STREAM_END Felix was quick to answer: Well, seems like you've found a bug in the parser. This looks like the parser may generate some node after the first root node of a document, although a document always has exactly one root node. This should never happen; if the YAML contains multiple root nodes, this should be a Parser_Error. I opened a new issue about this, to be checked later. Fuzzing GNATCOLL.JSON JSON parsers are a common fuzzing target, not that different from YAML. This could be interesting. Following a similar pattern as other fuzzing sessions, let’s first build a simple unit test that reads and parses an input file given at the command-line (first argument), using GNATCOLL.JSON (https://github.com/AdaCore/gnatcoll-core/blob/master/src/gnatcoll-json.ads). This time I massaged one of the unit tests into a simple “read a JSON file all in memory, decode it and print it” test program, that we’ll use for fuzzing. Note: for the exercise here I used GNATCOLL GPL 2016, because that's what I was using for a personal project. You should probably use the latest version when you do this kind of testing, at least before you report your findings. The test program is very simple: procedure JSON_Fuzzing_Test is Filename : constant String := Ada.Command_Line.Argument (1); JSON_Data : Unbounded_String := File_IO.Read_File (Filename); begin declare Value : GNATCOLL.JSON.JSON_Value := GNATCOLL.JSON.Read (Strm => JSON_Data, Filename => Filename); begin declare New_JSON_Data : constant Unbounded_String := GNATCOLL.JSON.Write (Item => Value, Compact => False); begin File_IO.Write_File (File_Name => "out.json", File_Contents => New_JSON_Data); end; end; end JSON_Fuzzing_Test; The GPR file is simple with a twist : to make sure we compile this program with gnatcoll, and that when we’ll use afl-gcc we’ll compile the library code with our substitution compiler, we’ll “with” the actual “gnatcoll_full.gpr” (actual gnatcoll source code !) and not the one for the compiled library. Then we build the project in "debug" mode, to get all the runtime checks available: gprbuild -p -P gnat_json_fuzzing_test.gpr -XGnatcoll_Build=Debug  Then I tried to find a test corpus. One example is https://github.com/nst/JSONTestSuite cited in “Parsing JSON is a minefield”. There’s a test_parsing folder there that contains 318 test cases. Trying to run them first on the new simple test program shows already several "crash" cases: • nice GNATCOLL.JSON.INVALID_JSON_STREAM exceptions • Numerical value too large to fit into an IEEE 754 float • Numerical value too large to fit into a Long_Long_Integer • Unexpected token • Expected ',' in the array value • Unfinished array, expecting ending ']' • Expecting a digit after the initial '-' when decoding a number • Invalid token • Expecting digits after 'e' when decoding a number • Expecting digits after a '.' when decoding a number • Expected a value after the name in a JSON object at index N • Invalid string: cannot find ending " • Nothing to read from stream • Unterminated object value • Unexpected escape sequence … which is fine, since you’ll expect this specific exception when parsing user-provided JSON. Then I got to: • raised ADA.STRINGS.INDEX_ERROR : a-strunb.adb:1482 • n_string_1_surrogate_then_escape_u1.json • n_string_1_surrogate_then_escape_u.json • n_string_invalid-utf-8-in-escape.json • n_structure_unclosed_array_partial_null.json • n_structure_unclosed_array_unfinished_false.json • n_structure_unclosed_array_unfinished_true.json For which I opened https://github.com/AdaCore/gnatcoll-core/issues/5 • raised CONSTRAINT_ERROR : bad input for 'Value: "16#??????"]#" • n_string_incomplete_surrogate.json • n_string_incomplete_escaped_character.json • n_string_1_surrogate_then_escape_u1x.json For which I opened https://github.com/AdaCore/gnatcoll-core/issues/6 • … and STORAGE_ERROR : stack overflow or erroneous memory access • n_structure_100000_opening_arrays.json This last one can be worked around using with ulimit -s unlimited (so, removing the limit of stack size). Still, beware of your stack when parsing user-provided JSON. For AdaYaml similar problems appeared, and were robustified, and I’m not sure whether this potential “denial of service by stack overflow” should be classified as a bug, it’s at least something to know when using GNATCOLL.JSON on user-provided JSON data (I’m guessing most API endpoints these days). Those exceptions are the ones you don’t expect, and maybe didn’t put a catch-all there. A clean GNATCOLL.JSON.INVALID_JSON_STREAM exception might be better. Note: on all those test cases, I didn’t check whether the results of the tests were OK. I just checked for crashes. It might be very interesting to check the corrected of GNATCOLL.JSON against this test suite. Now let’s try through fuzzing to find more cases where you don’t get a clean GNATCOLL.JSON.INVALID_JSON_STREAM. The first step is adding a final “catch-all” exception handler to abort only on unwanted exceptions (not all of them): exception -- we don’t want to abort on a “controlled” exception when GNATCOLL.JSON.INVALID_JSON_STREAM => null; when Occurence : others => Ada.Text_IO.Put_Line ("exception occured for " & Filename & " [" & Ada.Exceptions.Exception_Name (Occurence) & "] [" & Ada.Exceptions.Exception_Message (Occurence) & "] [" & Ada.Exceptions.Exception_Information (Occurence) & "]"); GNAT.Exception_Actions.Core_Dump (Occurence); end JSON_Fuzzing_Test; And then clean the generated executable: gprclean -r -P gnat_json_fuzzing_test.gpr -XGnatcoll_Build=Debug Then rebuild it using afl-gcc: gprbuild --compiler-subst=Ada,/home/lionel/afl/afl-2.51b/afl-gcc -p -P gnat_json_fuzzing_test.gpr -XGnatcoll_Build=Debug Then we generate an input corpus for AFL, by keeping only the files that didn’t generate a call to abort() with the new JSON_Fuzzing_Test test program. On first launch (AFL_I_DONT_CARE_ABOUT_MISSING_CRASHES=ON /home/lionel/aws/afl/afl-2.51b/afl-fuzz -m 1024 -i input -o output ../json_fuzzing_test @@), afl-fuzz complains: [*] Attempting dry run with 'id:000001,orig:i_number_huge_exp.json'... [-] The program took more than 1000 ms to process one of the initial test cases. This is bad news; raising the limit with the -t option is possible, but will probably make the fuzzing process extremely slow. If this test case is just a fluke, the other option is to just avoid it altogether, and find one that is less of a CPU hog. [-] PROGRAM ABORT : Test case 'id:000001,orig:i_number_huge_exp.json' results in a timeout Location : perform_dry_run(), afl-fuzz.c:2776 … and it’s true, the i_number_huge_exp.json file takes a long time to be parsed: [lionel@lionel fuzzing-session]$ time ../json_fuzzing_test input/i_number_huge_exp.json
input/i_number_huge_exp.json:1:2: Numerical value too large to fit into an IEEE 754 float
real    0m7.273s
user    0m3.717s
sys     0m0.008s

My machine isn’t fast, but still, this is a denial of service waiting to happen. I opened a ticket just in case.

Anyway let’s remove those input files that gave a timeout before we even started the fuzzing (the other ones are n_structure_100000_opening_arrays.json and n_structure_open_array_object.json).

During this first afl-fuzz run, in the start phase, a warning appears a lot of times:

[!] WARNING: No new instrumentation output, test case may be useless.

AFL looks through the whole input corpus, and checks whether input files have added any new basic block coverage to the already tested examples (also from the input corpus).

The initial phase ends with:

[!] WARNING: Some test cases look useless. Consider using a smaller set.
[!] WARNING: You probably have far too many input files! Consider trimming down.

To be the most efficient, afl-fuzz needs the slimmest input corpus with the highest basic block coverage, the most representative of all the OK code paths, and the least redundant possible. You can look through the afl-cmin and afl-tmin tools to minimize your input corpus.

For this session, let’s keep the test corpus as it is (large and redundant), and start the fuzzing session.

In the first seconds of fuzzing, we already get the following state:

Already 3 crashes, and 2 “hangs”. Looking through those, it seems afl-fuzz already found by itself examples of “ADA.STRINGS.INDEX_ERROR : a-strunb.adb:1482” and “CONSTRAINT_ERROR : bad input for 'Value: "16#?????”, although I removed from the corpus all files that showed those problems.

Same thing with the “hang”, afl-fuzz found an example of large float number, although I removed all “*_huge_*” float examples.

Let’s try and focus on finding something else than the ones we know.

I added the following code in the top-level exception handler:

   when Occurence : others =>
declare
Text : constant String := Ada.Exceptions.Exception_Information (Occurence);
begin
if    Ada.Strings.Fixed.Index (Source => Text, Pattern => "bad input for 'Value:") /= 0 then return;
elsif Ada.Strings.Fixed.Index (Source => Text, Pattern => "a-strunb.adb:1482") /= 0 then return;
end if;
end;

It’s very hacky but it’ll remove some parasites (i.e. the crashes we know) from the crash bin.

Let’s restart the fuzzing session (remove the output/ directory, recreate it, and call afl-fuzz again).

Now after 10 minutes, no crash had occured, so I let the fuzzer run for 2 days straight, and it didn’t find any crash or hang other than the ones already triggered by the test suite.

It did however find some additional stack overflows (with examples that open a lot of arrays) even though I had put 1024m as a memory limit for afl-fuzz… Maybe something to look up...

What next?

If you want to dive deeper in the subject of fuzzing with AFL, here's a short reading list for you:

• Even simpler, if you have an extensive test case list, you can use afl-cmin (a corpus minimizer) to directly fuzz your parser or application efficiently, see the great success of AFL on sqlite.
• The fuzzing world took on the work of lcamtuf and you can often hear about fuzzing-specific passes in clang/llvm to help fuzzing where it's bad at (checksums, magic strings, whole-string comparisons...).
• There's a lot of tooling around afl-fuzz: aflgo directs the focus of a fuzzing session to specific code parts, and Pythia helps evaluate the efficiency of your fuzzing session. See also afl-cov for live coverage analysis.

If you find bugs, or even just perform a fuzzing pass on your favorite open-source software, don't hesitate to get in touch with the maintainers of the project. From my experience, most of the time, maintainers will be happy to get free testing. Even if it's just to say that AFL didn't find anything in 3 days... It's already a badge of honor :-).

Thanks:

Many thanks to Yannick Moy that sparked the idea of this blog post after I talked his ear off for a year (was it two?) about AFL and fuzzing in Ada, and helped me proof-read it. Thanks to Gautier and Felix who were very reactive and nice about the reports, and who took some time to read drafts of this post. All your suggestions were very helpful.

]]>

Libadalang has come a long way since the last time we blogged about it. In the past 6 months, we have been working tirelessly on name resolution, a pretty complicated topic in Ada, and it is finally ready enough that we feel ready to blog about it, and encourage people to try it out.

WARNING: While pretty far along, the work is still not finished. It is expected that some statements and declarations are not yet resolved. You might also run into the occasional crash. Feel free to report that on our github!

In our last blog post, we learned how to use Libadalang’s lexical and syntactic analyzers in order to highlight Ada source code. You may know websites that display source code with cross-referencing information: this makes it possible to navigate from references to declarations. For instance elixir, Free Electrons’ Linux source code explorer: go to a random source file and click on an identifier. This kind of tool makes it very easy to explore an unknown code base.

So, we extended our code highlighter to generate cross-references links, as a showcase of Libadalang’s semantic analysis abilities. If you are lazy, or just want to play with the code, you can find a compilable set of source files for it at Libadalang’s repository on GitHub (look for ada2web.adb). If you are interested in how to use name resolution in your own programs, we will use this blog post to show how to use Libadalang’s name resolution to expand our previous code highlighter.

Note that if you haven’t read the previous blog post, we recommend you to read it as below, we assume familiarity with topics from it.

Where are my source files?

Unlike lexical and syntactic analysis, which process source files separately, semantic analysis works on a set of source files, or more precisely on a source files plus all its dependencies. This is logical: in order to understand an object declaration in foo.ads, one needs to know about the corresponding type, and if the type is declared in another source file (say bar.ads), both files are required for analysis.

By default, Libadalang assumes that all source files are in the current directory. That’s enough for toy source files, but not at all for real world projects, which are generally spread over multiple directories in a complex nesting scheme. Libadalang can’t know about the files layout of all Ada projects in the world, so we created an abstraction that enables anyone to tell it how to reach source files: the Libadalang.Analysis.Unit_Provider_Interface interface type. This type has exactly one abstract primitive: Get_Unit which, given a unit name and a unit kind (specification or body?) calls Analysis_Context’s Get_From_File or Get_From_Buffer to create the corresponding analysis unit.

In the context of a source code editor (for instance), this allows Libadalang to query a source file even if this file exists only in memory, not in a real source file, or if it’s more up-to-date in memory. Using a custom unit provider in Libadalang is easy: dynamically allocate a concrete implementation of this interface, then pass it to the Unit_Provider formal in Analysis_Context’s constructor: the Create function. Libadalang will take care of deallocating this object when the context is destroyed.

declare
UP  : My_Unit_Provider_Access :=
new My_Unit_Provider_Type …;
Ctx : Analysis_Context := Create (Unit_Provider => UP);
--  UP will be queried when performing name resolution
begin
--  Do useful things, and then when done…
Destroy (Ctx);
end;

Nowadays, a lot of Ada projects use GPRbuild and thus have a project file. That’s fortunate: project files give us exactly the information Libadalang needs: where are source files, what’s their naming scheme. Because of this, Libadalang provides a tagged type that implements this interface to deal with project files: Project_Unit_Provider_Type, from the Libadalang.Unit_Files.Projects package. In order to do this, one first need to load the project file using GNATCOLL.Projects:

declare
Project_File : GNATCOLL.VFS.Virtual_File;
Project      : GNATCOLL.Projects.Project_Tree_Access;
Env          : GNATCOLL.Projects.Project_Environment_Access;
begin
--  First load the project file
Project := new Project_Tree;
Initialize (Env);

--  Initialize Project_File, set the target, create
--  scenario variables, …

--  Now create the unit provider and the analysis context.
--  Is_Project_Owner is set to True so that the project
--  is deallocated when UP is destroyed.
UP := new Project_Unit_Provider_Type’
(Create (Project, Env, True));
Ctx := Create (Unit_Provider => UP);

--  Do useful things, and then when done…
Destroy (Ctx);
end;

Now that Libadalang knows where the source files are, we can ask it to resolve names!

Just like in the highligther, most of the website generator will consist of asking Libadalang to parse source files (Get_From_File), checking for lexing/parsing errors (Has_Diagnostics, Diagnostics) and then dealing with AST nodes and tokens in analysis units. The new bits here are turning identifiers into hypertext links to redirect to their definition. As for highlighting classes, we do this token annotation with an array and a tree traversal:

Unit : Analysis_Unit := …;
--  Analysis unit to process

Xrefs : array (1 .. Token_Count (Unit)) of Basic_Decl :=
(others => No_Basic_Decl);
--  For each token, the declaration to which the token should
--  link or No_Basic_Decl for no cross-reference.

function Process_Node
--  Callback for AST traversal. For string literals and
--  identifiers, annotate the corresponding in Xrefs to the
--  designated declaration, if found.

With these declarations, we can do the annotations easily:

Root (Unit).Traverse (Process_Node’Access);

But how Process_Node does its magic? That’s easy too:

function Process_Node
(Node : Ada_Node’Class) return Visit_Status is
begin
--  Annotate only tokens for string literals and
--  identifiers.
then
return Into;
end if;

declare
Token : constant Token_Type :=
Node.As_Single_Tok_Node.F_Tok;
Idx   : constant Natural := Natural (Index (Token));
Decl  : Basic_Decl renames Xrefs (Idx);
begin
Decl := Node.P_Referenced_Decl;
exception
when Property_Error => null;
end;
end Process_Node;

String literal and identifier nodes both inherit from the Single_Tok_Node abstract node, hence the conversion to retrieve the underlying token. Then we locate which cell in the Xrefs array they correspond to. And finally we fill it with the result of the P_Referenced_Decl primitive. This function tries to fetch the declaration corresponding to Node. Easy I said!

What’s the exception handler for, you might ask, though? What we call AST node properties (all functions whose name starts with P_) can raise Property_Error exceptions. These can happen if Libadalang works on invalid Ada sources and cannot find query results. As name resolution is still actively developed, it can happen that this exception is raised even for valid source code: if that happens to you, please report this bug! Note that if a property raises an exception that is not a Property_Error, this is another kind of bug: please report it too!

Bind it all together

Now we have a list of Basic_Decl nodes to create hypertext links, but how can we do that? The trick is to get the name of the source file that contains this declaration, plus its source location:

Decl_Unit : constant Analysis_Unit := Decl.Get_Unit;
Decl_File : constant String := Get_Filename (Decl_Unit);
Decl_Line : constant Langkit_Support.Slocs.Line_Number :=
Decl.Sloc_Range.Start_Line;

Then you can turn this information into a hypertext link. For example, if you generate X.html for the X source file (foo.ads.html for foo.ads, …) and generate LY HTML anchors for line number Y:

Line_No : constant String :=
Natural'Image (Natural (Decl_Line));
Href : constant String :=
Decl_File & ".html#L"
& Line_No (Line_No'First + 1 .. Line_No'Last);

Some amount of plumbing is still needed to have a complete website generator:

• get a list of all source files to process in the loaded project, using GNATCOLL.Projects’ API;

• actually output HTML code: code from the previous blog can be reused and updated to do this;
• generate an index HTML file as an entry point for navigation.

But as usual, covering all these topics will get out of the scope of this blog post and will make an unreasonable long essay. So thank you once more for reading this post to the end!

]]>

Summary

The Ada IoT Stack consists of an lwIp (“lightweight IP”) stack implementation written in Ada, with an associated high-level protocol to support embedded device connectivity nodes for today’s IoT world. The project was developed for the Make With Ada 2017 competition based on existing libraries and ported to embedded STM32 devices.

Motivation

Being a huge fan of IoT designs, I originally planned to work on a real device such an IoT node or gateway, while using the Ada language. I really enjoy writing programs but my roots are in hardware (I earned a B.S. in electronic engineering). Back then I was not really a programmer, but if you can master the intricacies of hardware I think that experience will eventually help make you a good programmer; I’m not sure it works the same way in the other direction.

With so many programming languages out there, it's difficult to gain experience with all of them. In my case I was not even aware of the Ada language until I found out about the contest. That got me to learn Ada: there is a saying that without deadlines nobody will finish on time, and the contest supplied both the motivation and a deadline. For me the best way to learn something is not to read a book chapter by chapter. Sometimes you need to read a “getting starting” guide to gain a basic knowledge, but after that if you don't have a problem to solve, your motivation might come to an end. I made my choice to continue, with the IoT project.

Network Stack

When I started the IoT project I soon realized that the Ada Drivers Library didn't provide a TCP/IP stack. The Etherscope software by Mr. Carrez from the 2016 Make with Ada contest provided the Ethernet connectivity and UDP protocol but not TCP. I started to look at how to implement the TCP/IP stack, but due to my lack of experience in Ada and the fact that contest was running for almost two months it would not have worked to do something from scratch. So I wrote to Mr. Chouteau (at AdaCore) asking for information about any Ada libraries that implement a TCP stack, and he referred me to a lwIp implementation in Ada and Spark 2014. It was on a Spark 2014 git repository that had not shown up on a Google search. As far as I could tell, the code was only tested on a Linux-flavor OS using a TAP interface. I spent a couple of weeks studying the code and getting it to work on my debian box, which was a little hacky at the end since the TAP C driver implementation (system calls) didn’t work as expected; I ended up coding the TAP interface by hand.

A TAP device operates at layer 2 of the OSI model, which means Ethernet frames. That makes sense here since lwIp implements networking from Ethernet datagrams. Then I realized that I could combine the Etherscope project with the lwIp implementation. After removing several parts of the Etherscope project to get only Ethernet Frames I was ready to feed the lwIp stack. It was not so easy, I spent some time porting the lwIp Ada code to the Ada version for embedded ARM. One obstacle I found was the use of xml files to describe the network datagrams and other structures in a way that's used by a program called xmlada to generate some Ada body files. These describe things like bit and byte positions of TCP flags or fields within the datagram. The problem was that the ARM version don't provide xmlada, so I end up copying the generated files in my project.

After quite some time I got the lwIp stack to work on my STM32F769I board. This was no easy task, especially because the STLink debugger is not so easy to work with. (For example semihosting is basically the only way to have debugger output in the form of "printf". This is really slow and basically interrupts the flow of program execution in a nasty way. The problem here is that the ST board doesn't provide a JTAG interface to the Cortex M4/M7 device, and the STlink on board doesn't have an SWO line connection.)

IoT Stack

The TCP/IP stack was just the beginning; it was really nice to see it working, but quickly gets boring. The original lwIp implements a TCP echo server: you open a socket, connect and then anything you send is replied by the server, which is not very useful for IoT. So I felt I was not making real progress, at least toward something that would give the judges a tangible project to evaluate. Again I was in a rush, this time with more knowledge of Ada but, as be