Martyn’s recent blog post showed small programs based on Libadalang to find uses of access types in Ada sources. Albeit short, these programs need to take care of all the tedious logistics around processing Ada sources: find the files to work on, create a Libadalang analysis context, use it to read the source files, etc. Besides, they are not very convenient to run:

$gprls -s -P test.gpr | ./ptrfinder1 | ./ptrfinder2 The gprls command (shipped with GNAT Pro) is used here in order to get the list of sources that belong to the test.gpr project file. Wouldn’t it be nice if our programs could use the GNATCOLL.Projects API in order to read this project file themselves and get the list of sources to process from there? It’s definitely doable, but also definitely cumbersome: first we need to get the appropriate info from the command line (project file name, potentially target and runtime information, or a *.cgpr configuration file), then call all the various APIs to load the project, and many more operations. Such operations are so common for tools using Libadalang that we have decided to include helpers to factor this in the library itself, so that programs can focus on their real purpose. The 20.1 Libadalang release provides building blocks to save you this trouble: check the App generic package in Libadalang.Helpers. Note that you can see a tutorial and its API reference for it in our nightly documentation. This package is intended to be used as a framework: you instantiate it with your settings at the top-level of your program and call its Run procedure. App then takes over the control of the program: it parses command-line options and invokes when appropriate the callbacks you provided it. Let’s update Martyn’s programs to use App. The job of the first program (ptrfinder1) is to go through source files and report access type declarations and object declarations that have access types. First, we declare some shortcuts for code brevity:  package Helpers renames Libadalang.Helpers; package LAL renames Libadalang.Analysis; package Slocs renames Langkit_Support.Slocs; Next, we can instantiate App:  procedure Process_Unit (Job_Ctx : Helpers.App_Job_Context; Unit : LAL.Analysis_Unit); -- Look for the use of access types in Unit package App is new Helpers.App (Name => "ptrfinder1", Description => "Look for the use of access types in the input sources", Process_Unit => Process_Unit); Naturally, the Process_Unit procedure will be called once for each file to process. The Name and Description formals allow the automatic generation of a “help” message on the command-line (see later). Implementing the Process_Unit procedure is as easy as running minor adjustments on Martyn’s original code:  procedure Report (Node : LAL.Ada_Node'Class); -- Report the use of an access type at Filename/Line_Number on the standard -- output. ------------ -- Report -- ------------ procedure Report (Node : LAL.Ada_Node'Class) is Filename : constant String := Node.Unit.Get_Filename; Line : constant Slocs.Line_Number := Node.Sloc_Range.Start_Line; begin Put_Line (Filename & ":" & Ada.Strings.Fixed.Trim (Line'Image, Ada.Strings.Left)); end Report; ------------------ -- Process_Unit -- ------------------ procedure Process_Unit (Job_Ctx : Helpers.App_Job_Context; Unit : LAL.Analysis_Unit) is pragma Unreferenced (Job_Ctx); function Process_Node (Node : Ada_Node'Class) return Visit_Status; -- Callback for LAL.Traverse ------------------ -- Process_Node -- ------------------ function Process_Node (Node : Ada_Node'Class) return Visit_Status is begin case Node.Kind is when Ada_Base_Type_Decl => if Node.As_Base_Type_Decl.P_Is_Access_Type then Report (Node); end if; when Ada_Object_Decl => if Node.As_Object_Decl.F_Type_Expr .P_Designated_Type_Decl.P_Is_Access_Type then Report (Node); end if; when others => -- Nothing interesting was found in this Node so continue -- processing it for other violations. return Into; end case; -- A violation was detected, skip over any further processing of this -- node. return Over; end Process_Node; begin if not Unit.Has_Diagnostics then Unit.Root.Traverse (Process_Node'Access); end if; end Process_Unit;  We’re nearly done! All that’s left to do is to make our program only call the Run procedure: begin App.Run; end ptrfinder1; That’s it. Build and run this program: $ ./ptrfinder1
No source file to process

$./ptrfinder1 basic_pointers.adb /tmp/access-type-detector/test/basic_pointers.adb:3 /tmp/access-type-detector/test/basic_pointers.adb:5 So far, so good. $ ./ptrfinder1 --help
usage: ptrfinder1 [--help|-h] [--charset|-C CHARSET] [--project|-P PROJECT]
[--scenario-variable|-X
SCENARIO-VARIABLE[SCENARIO-VARIABLE...]] [--target TARGET]
[--RTS RTS] [--config CONFIG] [--auto-dir|-A
AUTO-DIR[AUTO-DIR...]] [--no-traceback] [--symbolic-traceback]
files [files ...]

Look for the use of access types in the input sources

positional arguments:
files                 Files to analyze

optional arguments:
--help, -h            Show this help message
[…]

Wow, that’s a lot! As you can see, App takes care of parsing command-line arguments and provides a lot of built-in options. Most of them are for the various ways to communicate to the application the set of source files to process:

• "ptrfinder1 source1.adb source2.adb …" will process all source files on the command-line, assuming that all source files belong to the current directory;
• "ptrfinder1 -P my_project.gpr [-XKEY=VALUE] [--target=…] [--RTS=…] [--config=…]" will process all source files that belong to the my_project.gpr project file. If additional  source files appear on the command-line, ptrfinder1 will process only them, but my_project.gpr will still be used to find the other source files.

• "ptrfinder1 --auto-dir=src1 --auto-dir=src2" will process all Ada source files that can be found in the src1 and src2 directories. Likewise, additional source files on the command-line will restrict processing to them.

These three use cases should cover most needs, the most reliable one being the project file way: calling gprbuild on the project file (with the same arguments) is a cheap way to check using the compiler that the set of sources passed to the application/Libadalang is complete, consistent and valid Ada.

As it is a common gotcha, let’s take a moment to note that even though your application may process only one source file, Libadalang may need to get access to other source files. For instance, computing the type of a variable in source1.adb may require to read pkg.ads, which defines the type of this variable. This is why passing a project file or --auto-dir options is useful even when you pass the list of source files to process explicitly on the command-line.

Martyn’s second program (ptrfinder2) doesn’t use Libadalang, so rewriting it to use App isn’t very interesting. Instead, let’s extend the previous program to run the text verification on the fly. We are going to add a command-line option to our application to optionally do the verification. Right after the App instantiation, add:

   package Do_Verify is new GNATCOLL.Opt_Parse.Parse_Flag
(App.Args.Parser,
Long => "--verify",
Help => "Verify detected ""access"" occurences");

App’s command-line parser (App.Args.Parser) uses the GNATCOLL.Opt_Parse library, so adding support for new command-line options is very easy. Here, we add a flag, i.e. a switch with no argument: it’s either present or absent. Just doing this already extends the automatic help message:

$./ptrfinder1 --help usage: ptrfinder1 […] files [files ...] [--verify] Look for the use of access types in the input sources positional arguments: files Files to analyze optional arguments: […] --verify, Verify detected "access" occurences Now we can modify the Report procedure to handle this option:  function Verify (Filename : String; Line : Slocs.Line_Number) return Boolean; -- Return whether Filename can be read and that its Line'th line contains -- the " access " substring. procedure Report (Node : LAL.Ada_Node'Class); -- Report the use of an access type at Filename/Line_Number on the standard -- output. If --verify is enabled, check that the first source line -- corresponding to Node contains the " access " substring. ------------ -- Verify -- ------------ function Verify (Filename : String; Line : Slocs.Line_Number) return Boolean is -- Here, we could directly look for an "access" token in the list of -- tokens corresponding to Line in this unit. However, in the spirit of -- the original program, re-read the file with Ada.Text_IO. Found : Boolean := False; -- Whether we have found the substring on the expected line File : File_Type; -- File to read (Filename) begin Open (File, In_File, Filename); for I in 1 .. Line loop declare use type Slocs.Line_Number; Line_Content : constant String := Get_Line (File); begin if I = Line and then Ada.Strings.Fixed.Index (Line_Content, " access ") > 0 then Found := True; end if; end; end loop; Close (File); return Found; exception when Use_Error | Name_Error | Device_Error => Close (File); return Found; end Verify; ------------ -- Report -- ------------ procedure Report (Node : LAL.Ada_Node'Class) is Filename : constant String := Node.Unit.Get_Filename; Line : constant Slocs.Line_Number := Node.Sloc_Range.Start_Line; Line_Image : constant String := Ada.Strings.Fixed.Trim (Line'Image, Ada.Strings.Left); begin if Do_Verify.Get then if Verify (Filename, Line) then Put_Line ("Access Type Verified on line #" & Line_Image & " of " & Filename); else Put_Line ("Suspected Access Type *NOT* Verified on line #" & Line_Image & " of " & Filename); end if; else Put_Line (Filename & ":" & Line_Image); end if; end Report; And voilà! Let’s check how it works: $ ./ptrfinder1 basic_pointers.adb --verify
Access Type Verified on line #3 of /tmp/access-type-detector/test/basic_pointers.adb
Access Type Verified on line #5 of /tmp/access-type-detector/test/basic_pointers.adb

When writing Libadalang-based tools, don’t waste time with trivialities such as command-line parsing: use Libadalang.Helpers.App and go directly to the interesting parts!

You can find the compilable project for this post on my GitHub fork. Just make sure you get Libadalang 20.1 or the next Continuous Release (coming in February 2020). As usual, please send us suggestions and bug reports on GNATtracker (if you are an AdaCore customer) or on Libadalang’s GitHub project.

]]>

The GNAT-LLVM project provides an opportunity to port Ada to new platforms, one of which is WebAssembly. We conducted an experiment to evaluate the porting of Ada and the development of bindings to use Web API provided by the browser directly from Ada applications.

## Subtotals

As a result of the experiment, the standard language library and runtime library were partially ported. Together with a binding for the Web API, this allowed us to write a simple example showing the possibility of using Ada for developing applications compiled into WebAssembly and executed inside the browser. At the same time, there are some limitations both of WebAssembly and of the current GNAT-LLVM implementation:

• the inability to use tasks and protected types
• support for exceptions limited to local propagation and the last chance handler
• the inability to use nested subprograms

## Example

Here is small example of an Ada program that shows/hides the text when pressing the button by manipulating attributes of document nodes.

with Web.DOM.Event_Listeners;
with Web.DOM.Events;
with Web.HTML.Buttons;
with Web.HTML.Elements;
with Web.Strings;
with Web.Window;

package body Demo is

function "+" (Item : Wide_Wide_String) return Web.Strings.Web_String
renames Web.Strings.To_Web_String;

type Listener is
limited new Web.DOM.Event_Listeners.Event_Listener with null record;

overriding procedure Handle_Event
(Self  : in out Listener;
Event : in out Web.DOM.Events.Event'Class);

L : aliased Listener;

------------------
-- Handle_Event --
------------------

overriding procedure Handle_Event
(Self  : in out Listener;
Event : in out Web.DOM.Events.Event'Class)
is
X : Web.HTML.Elements.HTML_Element
:= Web.Window.Document.Get_Element_By_Id (+"toggle_label");

begin
X.Set_Hidden (not X.Get_Hidden);
end Handle_Event;

---------------------
-- Initialize_Demo --
---------------------

procedure Initialize_Demo is
B : Web.HTML.Buttons.HTML_Button
:= Web.Window.Document.Get_Element_By_Id
(+"toggle_button").As_HTML_Button;

begin
B.Set_Disabled (False);
end Initialize_Demo;

begin
Initialize_Demo;
end Demo;


As you can see, it uses elaboration, tagged and interface types, and callbacks.

## Setup & Build

To compile the examples you need to setup GNAT-LLVM & GNAT WASM RTL following instructions in README.md file.

To compile specific example use gprbuild to build application and open index.html in the browser to run it.

## Next steps

The source code is published in a repository on GitHub and we invite everyone to participate in the project.

### Attachments

]]>

Like last year and the year before, AdaCore will participate to the celebration of Open Source software at FOSDEM. It is always a key event for the Ada/SPARK community and we are looking forward to meet Ada enthusiasts. You can check the program of the Ada/SPARK devroom here.

We have a talk in the Hardware Enablement devroom:

And there is a related talk in the Security devroom on the use of SPARK for security:

Hope to see you at FOSDEM this week-end!

]]>

In the last couple of years, the maker community switched from AVR based micro-controllers (popularized by Arduino) to the ARM Cortex-M architecture. AdaFruit was at the forefront of this migration, with boards like the Circuit Playground Express or some of the Feathers.

AdaFruit chose to adopt the Atmel (now Microchip) SAMD micro-controller family. Unfortunately for us it is not in the list of platforms with the most Ada support so far (stay tuned, this might change soon ;)).

So I was quite happy to see AdaFruit release their first Feather format board including a micro-controller with plenty of Ada support, the STM32F4. I bought a board right away and implemented some support code for it.

The support for the Feather STM32F405 is now available in the Ada Drivers Library, along with two examples. The first just blinks the on-board LED and the second displays Make With Ada on a CharlieWing expansion board.

## Setup

You then have to run the script scripts/install_dependencies.py to install the run-time BSPs.

## Build

To build the example, open one of the the project files examples/feather_stm32f405/blinky/blinky.gpr or examples/feather_stm32f405/charlie_wing/charlie_wing.gpr with GNATstudio (Aka GPS), and click on the “build all” icon.

## Program the board

To program the example on the board, I recommend using the Black Magic Probe debugger (also available from AdaFruit). This neat little device provides a GDB remote server interface to the STM32F4, allowing you not only to program the micro-controller but also to debug it.

An alternative is to use the DFU mode of the STM32F4.

Happy hacking :)

]]>

For nearly four decades the Ada language (in all versions of the standard) has been helping developers meet the most stringent reliability, safety and security requirements in the embedded market. As such, Ada has become an entrenched player in its historic A&D niche, where its technical advantages are recognized and well understood. Ada has also seen usage in other domains (such as medical and transportation) but its penetration has progressed at a somewhat slower pace. In these other markets Ada stands in particular contrast with the C language, which, although suffering from extremely well known and documented flaws, remains a strong and seldom questioned default choice. Or at least, when it’s not the choice, C is still the starting point (a gateway drug?) for alternatives such as C++ or Java, which in the end still lack the software engineering benefits that Ada embodies..

Throughout AdaCore’s twenty-five year history, we’ve seen underground activities of software engineers willing to question the status quo and embark on new technological grounds. But driving such a change is a tough sell. While the merits of the language are usually relatively easy to establish, overcoming the surrounding inertia often feels like an insurmountable obstacle. Other engineers have to be willing to change old habits. Management has to be willing to invest in new technology. All have to agree on the need for safer, more secure and more reliable software. Even if we’ve been able to report some successes over the years, we were falling short of the critical mass.

Or so it seemed.

The tide has turned. 2018 and 2019 have been exceptional vintages in terms of Ada and SPARK adoption, all the signs are showing that 2020 will be at least as exciting. What’s more - the new adopters are coming from industries that were never part of the initial Ada and SPARK user base. What used to be inertia is now momentum. Let’s take a look at the information that can be gathered from the web over the past two years to demonstrate the new dynamic of Ada and SPARK usage.

# The Established User Base

Due to the nature of the domain, it is difficult to communicate specifically about these projects, and we only have scarce news. One measure of the increasing interest in Ada and SPARK can be inferred from defense-driven research projects which contain references to these language technologies. The most notable example is the recent UK-funded HICLASS project, focused on security, which involves a large portion of the UK defense industry. Some press releases are also available, in particular in the space domain (European Space Agency, AVIO and MDA). These data samples are representative of a very active and vibrant community which is committed to Ada and SPARK for decades to come - effectively guaranteeing their industrial future as far as we can reasonably guess.

The so-called “established user base” has fueled the Ada and SPARK community up until roughly the mid 2010s. At that point of time, a new trend started to emerge, from users and use cases that we’had never seen before. While each case is a story in its own right, some common patterns have emerged. The starting point is almost always either the increase of safety or security requirements, or a wish to reduce the costs of development of an application with some kind of high reliability needs. This is connected to the acknowledgement that the programming language in use - almost exclusively C or C++ - may not be the optimal language to reach these goals. This is well documented in the industry; C and C++ flaws have been the subject of countless papers, and the source of catastrophic vulnerability exploits and tools to work around issues. The technical merits of Ada and its ability to prevent many of these issues is also well documented - we even have access to some measurements of the effects. The most recent one is an independent study developed by VDC, which measured up to 38% cost-savings on Ada vs C in the context of high-integrity markets that have adopted Ada for a long time.

We’re talking a lot about Ada here, but in fact new adopters are typically driven by a mix of SPARK and Ada. The promise that SPARK offers is automatic verification of software properties such as absence of buffer overflow, together with stringent mitigation of others - and this by design, early in the development process. This means that developers are able to self-check their code - not only is the code more reliable, it is also more reliable straight out as you write it, avoiding many mistakes that could otherwise pass through testing, integration or deployment phases.

Some of the SPARK adopters motivated by these benefits come from academia. Over the past 2 years, over 40 universities have joined the GNAT Academic Program (“GAP”), with a mix of teaching and research activities, including for example FH Campus Wien train project, CubeSat and UPMSat-2.

Many adopters can also be found in industry. Some of the following references highlight teams at the research phase, some others represent projects already deployed. They all however contribute to this solid wave of new Ada and SPARK adopters. The publications referenced in the following paragraphs have been published between 2018 and 2019.

One obvious application for Ada and SPARK, where human lives are at risk, is the medical device domain. So it comes without surprise that this area is amongst those adopting the technology. Two interesting cases come to mind. The first one in RealHeart, a Scandinavian manufacturer that is developing an artificial heart with on-board software written in Ada and SPARK, who issued a press release and later made an in-depth presentation at SPARK & Frama-C days. The second reference comes from a large medical device corporation, Hillrom, who published a paper explaining the rationale for the selection of SPARK and Ada for development of ECG algorithms.

Another domain is everything that relates to security. The French security agency ANSSI studied various languages to implement a secure USB key and selected SPARK as the best choice. They published a research paper, presentation and source code. Another interesting new application has been implemented by a German company Componolit developing proven communication protocols

Of course, established markets are also at the party. The University of Colorado’s Laboratory for Atmospheric and Space Physics has recently adopted Ada to develop an application for the International Space Station. In the defense domain, the Air Force Research Labs is studying re-writing a drone framework from C++ to SPARK and doing functional proofs, with a public research paper and source code available.

While all of these domains provide interesting adopter stories, the one single domain that has demonstrated the most interest in the recent past is undoubtedly automotive. This is probably coming from the increasing complexity of electronics systems in cars, with applications such as Advanced Driver Assistance Systems (ADAS) and autonomous vehicles. References in this domain ranges from tier 1 suppliers such as Denso or JTEKT as well as OEMs and autonomous vehicle companies like Volvo’s subsidiary Zenuity

And there’s NVIDIA.

In January of this year, we published with NVIDIA a press release and a blog post, followed-up this November by a presentation at our annual Tech Days conference, and an on-line webex (also see the slides for the webex). In many respects, this is a unique tipping point in the history of Ada adoption in terms of impact in a non-A&D domain, touching considerations ranging from security to automotive safety, all under the tight constraints of firmware development. The webex in particular provides a unique dive into the reasons behind the adoption of SPARK and Ada by a company that didn’t have any particular ties to it initially. It also gives key insights on the challenges and costs of such an adoption, together with the benefits already observed. In many respects, this is almost an adoption guide to the technology from a business standpoint.

# Wrapping Up

Keep in mind that the above references are only those that are publicly available, which we know about. There are many more projects under the hood, and even more that we’re not even aware of. Everything considered, this is a very exciting time for the Ada and SPARK languages. Stay tuned, we have an array of new stories coming up for the months and years to come!

]]>

## What's changed?

In 2019 AdaCore created a UK business unit and embarked on a new and collaborative venture researching and developing advanced UK aerospace systems. This blog introduces the reader to ‘HICLASS’, describes our involvement and explains how participation in this project is aligned with AdaCore’s core values.

## Introducing HICLASS

The “High-Integrity, Complex, Large, Software and Electronic Systems” (HICLASS) project was created to enable the delivery of the most complex, software-intensive, safe and cyber-secure systems in the world. HICLASS is a strategic initiative to drive new technologies and best-practice throughout the UK aerospace supply chain, enabling the UK to affordably develop systems for the growing aircraft and avionics market expected over the coming decades. HICLASS includes key prime contractors, system suppliers, software tool vendors and Universities working together to meet the challenges of growing system complexity and size. HICLASS will allow the development of new, complex, intelligent and internet-connected electronic products that are safe and secure from cyber-attack and can be affordably certified.

The HICLASS project is supported by the Aerospace Technology Institute (ATI) Programme, a joint Government and industry investment to maintain and grow the UK’s competitive position in civil aerospace design and manufacture. The programme, delivered through a partnership between the ATI, Department for Business, Energy & Industrial Strategy (BEIS) and Innovate UK, addresses technology, capability and supply chain challenges.

The £32m investment program, led by Rolls-Royce Control Systems, focuses on the UK civil aerospace sector but also has direct engagement with the Defence, Science and Technology Laboratory (DSTL). The collaborative group, comprised of 16 funded partners and 2 unfunded partners, is made up of the following system developers, tool suppliers and academic institutions: AdaCore, Altran, BAE Systems, Callen-Lenz, Cobham, Cocotec, D-Risq, GE Aviation, General Dynamics UK, Leonardo, MBDA, University of Oxford, Rapita Systems, Rolls-Royce, University of Southampton, Thales, Ultra Electronics and University of York. As well as researching and developing advanced aerospace capabilities, the group aims to pool niche skills and build a highly collaborative community based around the enhanced understanding of shared problems. The project is split into 4 main work packages with 2 technology work packages focusing on integrated model based engineering, cyber-secure architectures and mechanisms, high integrity connectivity, networks and data distribution, advanced hardware platforms and smart sensors and advanced software verification capabilities. In addition, a work package will ensure domain exploitation and drive a cross-industry cyber-security regulatory approach for avionics. A final work package will see the development of integrated HICLASS technology demonstrators.

## Introducing ASSET

HICLASS also aims to build, promote and manage the Aerospace Software Systems Engineering and Technology (ASSET) partnership. This community is open to all organisations undertaking technical work in aerospace software and systems engineering in the UK and operates in a manner designed to promote sharing, openness and accessibility. Unlike HICLASS, ASSET publications are made under a Creative Commons Licence, and the group operates without any non-disclosure or collaboration agreements.

## AdaCore's R&D Work in the UK

Within HICLASS, AdaCore is working with partners across multiple work packages and is also leading a work package titled “SPARK for HICLASS”. This work package will develop and extend multiple SPARK-related technologies in order to satisfy industrial partner’s HICLASS requirements regarding safety and cyber-security.

SPARK is a globally recognised safety and security profile of Ada and a software programming language defined by IEC/ISO 8652:2012. Born out of a UK MOD sponsored research project, the first version of SPARK, based on Ada 83, was initially produced at the University of Southampton. Since then the technology has been progressively extended and refined and the latest version SPARK 2014, based on Ada 2012, is now maintained and developed by AdaCore and Altran in partnership. Due to its rich pedigree, earnt at the forefront of high integrity software assurance, SPARK plays a big part in AdaCore’s safe and secure software development tool offerings. Through focused and collaborative research and development, AdaCore will guide the evolution of multiple SPARK-related technologies towards a level where they are suitable for building demonstrable, safe and secure cyber-physical systems that meet the software implementation and verification requirements of HICLASS developed by UK Plc.

New extensions to the SPARK language, specific to HICLASS systems, will be developed, these will include the verification of cyber-safe systems and auto generated code. There is also a planned maturing of SPARK reusable code modules where AdaCore will be driven by the needs of our partners in providing high assurance reusable SPARK libraries resulting in the reduction of development time and reduced verification costs.

QGen, a qualifiable and tuneable code generation and model verification tool suite for a safe subset of Simulink® and Stateflow® models, is as a game changer in Model Based Software Engineering (MBSE). For HICLASS, AdaCore will place an emphasis on the fusion of SPARK verification capabilities and HICLASS-related emerging MBSE tools, allowing code level verification to be achieved at the model level. The generation of SPARK code, from our QGen tool, as well as various HICLASS partner’s MBSE technologies, will be researched and developed. Collaborative case studies will be performed to assess and measure success. Collaboration is a key critical success factor in meeting this objective; multiple HICLASS partners are developing MBSE tools and SPARK evolution will be achieved in close partnership with them.

The second, and complementary, objective of this work package is to research and develop cyber-secure counter measures and HICLASS verification strategies, namely in the form of compiler hardening and the development of a ‘fuzzing’ capability for Ada/SPARK. HICLASS case studies, produced within proceeding work packages, will be observed to ensure our SPARK work package is aligned with HICLASS specific standards, guidelines and recommendations and to ensure the relevancy of the work package deliverables.

The third objective is for AdaCore, in collaboration with our HICLASS partners, to evaluate QGen, and associated formal approaches, for existing UK aerospace control systems and to make comparisons with existing Simulink code generation processes. In addition, AdaCore will promote processor emulation technology through a collaborative HICLASS case study.

The final objective is to demonstrate the work package technology through the creation of a software stack capable of executing SPARK software on a range of (physical and emulated) target processors suitable for use in HICLASS. The ability to execute code generated from MBSE environments will also be demonstrated.

## Committing Investment into the UK

AdaCore has a long history of working with partners within the UK aerospace industry on safety-related, security-related and mission-critical software development projects. Participation in the HICLASS research and development group complemented AdaCore’s commitment to invest within the UK. This four-year research project is also an excellent fit with AdaCore’s core values and its existing and future capabilities. In addition, the creation of a new UK business unit, ‘AdaCore Ltd’, created to rapidly grow into our UK Centre of Excellence, ensures that our existing and future UK aerospace customers will continue to receive the high level of technical expertise and quality products associated with AdaCore.

History has shown that the UK aerospace industry isn’t afraid to be ambitious and has the technological capability to stay at the forefront of this rapidly growing sector. With HICLASS, the sky really is the limit, and AdaCore welcomes the opportunity to be a part of the journey and further extend our partnerships within this technologically advanced and continually growing market.

Further information about the ATI, BEIS and IUK...

Aerospace Technology Institute (ATI)

The Aerospace Technology Institute (ATI) promotes transformative technology in air transport and supports the funding of world-class research and development through the multi-billion pound joint government-industry programme. The ATI stimulates industry-led R&D projects to secure jobs, maintain skills and deliver economic benefits across the UK.

Setting a technology strategy that builds on the UK’s strengths and responds to the challenges faced by the UK civil aerospace sector; ATI provides a roadmap of the innovation necessary to keep the UK competitive in the global aerospace market, and complements the broader strategy for the sector created by the Aerospace Growth Partnership (AGP).

The ATI provides strategic oversight of the R&T pipeline and portfolio. It delivers the strategic assessment of project proposals and provides funding recommendations to BEIS.

Department for Business, Energy and Industrial Strategy (BEIS)

Department for Business, Energy and industrial Strategy (BEIS) is the government department accountable for the ATI Programme. As the budget holder for the programme, BEIS, is accountable for the final decision regarding projects to progress and fund with Government resources, as well as performing Value for Money (VfM) assessment on all project proposals, one of the 3 ATI Programme assessment streams.

Innovate UK (IUK)

Innovate UK is the funding agency for the ATI Programme. It delivers the competitions process including independent assessment of project proposals, and provides funding recommendations to BEIS. Following funding award, Innovate UK manages the programme, from contracting projects, through to completion.

Innovate UK is part of UK Research and Innovation (UKRI), a non-departmental public body funded by a grant-in-aid from the UK government. Innovate UK drives productivity and economic growth by supporting businesses to develop and realise the potential of new ideas, including those from the UK’s world-class research base.

UKRI is the national funding agency investing in science and research in the UK. Operating across the whole of the UK with a combined budget of more than £6 billion, UKRI brings together the 7 Research Councils, Innovate UK and Research England.

]]>

I’ve been telling Ada developers for a while now that Libadalang will open up the possibility of more-easily writing Ada source code analysis tools.  (You can read more about Libadalang here and here and can also access the project on Github.)

Along these lines, I recently had a discussion with a customer about whether there were any tools for detecting uses of access types in their code which got me thinking about possible ways to detect the use of Access Types in a set of Ada source code files.

GNATcheck doesn't currently have a rule that prohibits the use of access types.  Also, SPARK 2014 recently added support for Access Types, whereas previously they were banned.  So earlier versions of GNATprove could detect them quite effectively, the latest and future versions may not.

I decided to architect a solution to this problem and determined there were several implementation options open to me:

1. Use ‘grep’ on a set of Ada sources to find instances of the "access" Ada keyword
2. Use gnat2xml and then use ‘grep’ on the resulting output to search for certain tags
3. Use gnat2xml and then write an XML-aware search utility to search for certain tags

Option 1 and 2 just feel too easy and would defeat the purpose of this blog post.

Option 3 is perhaps a good topic for another post related to using XML/Ada, however I decided to put my money where my mouth is and go with Option 4!

While I wrote this program in Ada,  I could have written it in Python.

So here is the program:

with Ada.Text_IO;         use Ada.Text_IO;

procedure ptrfinder1 is

LAL_CTX  : constant Analysis_Context := Create_Context;

begin

while not End_Of_File(Standard_Input)
loop

declare

Filename : constant String := Get_Line;

Unit : constant Analysis_Unit := LAL_CTX.Get_From_File(Filename);

function Process_Node(Node : Ada_Node'Class) return Visit_Status is
begin

then
Put_Line(
Source => Filename & ":" & Node.Sloc_Range.Start_Line'Img,
)
);
end if;

return Into;

end Process_Node;

begin

if not Unit.Has_Diagnostics then
Unit.Root.Traverse(Process_Node'Access);
end if;

end ptrfinder1;

I designed the program to read a series of fully qualified absolute filenames from standard input and process each of them in turn.  This approach made the program much easier to write and test and,  as you'll see, allowed the program to be integrated effectively with other tools.

Let's deconstruct the code a little....

For each provided filename,  the program creates a Libadalang Analysis_Unit for that filename.

Read_Standard_Input:
while not End_Of_File(Standard_Input)
loop

declare

Filename : constant String := Get_Line;

Unit : constant Analysis_Unit :=
LAL_CTX.Get_From_File(Filename);

As long as it has no issues,  the Ada unit is traversed and the Process_Node subprogram is executed for each detected node.

if not Unit.Has_Diagnostics then
Unit.Root.Traverse(Process_Node'Access);
end if;

The Process_Node subprogram checks the Kind field of the detected Ada_Node'Class parameter to see if it is any of the access type related nodes.  If so,  the program outputs the fully qualified filename, a ":" delimiter, and the line number of the detected node.

function Process_Node(Node : Ada_Node'Class) return Visit_Status is
begin

then
Put_Line(
Source => Filename & ":" & Node.Sloc_Range.Start_Line'Img,
)
);
end if;

return Into;

end Process_Node;

At the end of the Process_Node subprogram,  the returned value allows the traversal to continue.

To make the program a more useful tool within a development environment based on GNAT Pro,  I integrated it with the piped output of the 'gprls' program.

gprls is a tool that outputs information about compiled sources. It gives the relationship between objects, unit names, and source files. It can also be used to check source dependencies as well as various other characteristics.

My program can then be invoked as part of a more complex command line:

$gprls -s -P test.gpr | ./ptrfinder1 Given the following content of test.gpr: project Test is For Languages use ("Ada"); for Source_Dirs use ("."); for Object_Dir use "obj"; end Test; Plus an Ada source code file called inc_ptr1.adb (in the same directory as test.gpr) containing the following: procedure Inc_Ptr1 is type Ptr is access all Integer; begin null; end Inc_Ptr1; The resulting output from the integration of gprls and my program is: /home/pike/Workspace/access-detector/test/inc_ptr1.adb:3 This output correctly identified the access type usage on line 3 of inc_ptr1.adb. But how do I know that my program or indeed Libadalang has functioned correctly? I decided to stick in principle to the UNIX philosophy of "Do One Thing and Do it Well" and write a second program to verify the output of my first program using a simple algorithm. This second program is given a filename and line number and verifies that the keyword "access" appears on the specified line number. Of course, I could also have embedded this verification into the first program, but to illustrate a point about diversity I chose not to. with Ada.Text_IO; use Ada.Text_IO; with Ada.Directories; use Ada.Directories; with Ada.Strings; use Ada.Strings; with Ada.Strings.Fixed; use Ada.Strings.Fixed; with Ada.IO_Exceptions; procedure ptrfinder2 is begin Read_Standard_Input: while not End_Of_File(Standard_Input) loop Process_Standard_Input: declare Std_Input : constant String := Get_Line; Delimeter_Position : constant Natural := Index(Std_Input,":"); Line_Number_As_String : constant String := Std_Input(Delimeter_Position+1..Std_Input'Last); Line_Number : constant Integer := Integer'Value(Line_Number_As_String); Filename : constant String := Std_Input(Std_Input'First..Delimeter_Position-1); The_File : File_Type; Verified : Boolean := False; begin if Ada.Directories.Exists(Filename) and then Line_Number > 1 then Open(File => The_File, Mode => In_File, Name => Filename); Locate_Line: for I in 1..Line_Number loop Verified := Index(Get_Line(The_File)," access ") > 0; exit when Verified or else End_Of_File(The_File); end loop Locate_Line; Close(File => The_File); end if; if Verified then Put_Line("Access Type Verified on line #" & Line_Number_As_String & " of " & Filename); else Put_Line("Suspected Access Type *NOT* Verified on line #" & Line_Number_As_String & " of " & Filename); end if; end Process_Standard_Input; end loop Read_Standard_Input; end ptrfinder2; I can then string the first and second program together: $ gprls -s -P test.gpr | ./ptrfinder1 | ./ptrfinder2

This produces the output:

Access Type Verified on line #3 of /home/pike/Workspace/access-detector/test/inc_ptr1.adb

It goes without saying that a set of Ada sources with no Access Type usage will result in no output from either the first or second program.

This expedition into Libadalang has reminded me how extremely effective Ada can be at writing software development tools.

The two programs described in this blog post were built and tested on 64-bit Ubuntu 19.10 using GNAT Pro and Libadalang.  They are also known to build successfully with the 64-bit Linux version of GNAT Community 2019.

]]>
RecordFlux: From Message Specifications to SPARK Code https://blog.adacore.com/recordflux-from-message-specifications-to-spark-code Thu, 17 Oct 2019 13:08:23 +0000 Alexander Senier https://blog.adacore.com/recordflux-from-message-specifications-to-spark-code

Software constantly needs to interact with its environment. It may read data from sensors, receive requests from other software  components or control hardware devices based on the calculations performed. While this interaction is what makes software useful in the first place, processing messages from untrusted sources inevitably creates an attack vector an adversary may use to exploit software vulnerabilities. The infamous Heartbleed is only one example where a security critical software was attacked by specially crafted messages.

Implementing those interfaces to the outside world in SPARK and proving the absence of runtime errors is a way to prevent such attacks. Unfortunately, manually implementing and proving message parsers is a tedious task which needs to be redone for every new protocol. In this article we'll discuss the challenges that arise when creating provable message parsers and present RecordFlux, a framework which greatly simplifies this task.

# Specifying Binary Messages

## Ethernet: A seemingly simple example

At a first glance, Ethernet has a simple structure: A 6 bytes destination field, a 6 bytes source field and a 2 bytes type field followed by the payload:

We could try to model an Ethernet frame as a simple record in SPARK:

package Ethernet is

type Byte is mod 2**8;
type Type_Length is mod 2**16;
type Payload is array (1..1500) of Byte;

type Ethernet_Frame is
record
EtherType   : Type_Length;
end record;

end Ethernet;

When looking closer, we realize that this solution is a bit short-sighted. Firstly, defining the payload as a fixed-size array as above will either waste memory when handling a lot of small (say, 64 bytes) frames or be too short when handling jumbo frames which exceed 1500 bytes. More importantly, the Ethernet header is not as simple as we pretended earlier. Looking at the standard, we realize that the EtherType field actually has a more complicated semantics to allow different frame types to coexist on the same medium.

If the value of EtherType is greater or equal to 1536, then it contains an Ethernet II frame. The EtherType is treated as a type field which determines the protocol contained in Data. In that case, the Data field uses up the remainder of the Ethernet frame. If the value of EtherType is less or equal to 1500, then it contains a IEEE 802.3 frame and the EtherType field represents the length of the Data field. Frames with an EtherType value between 1501 and 1535 are considered invalid.

To make things even worse, both variants may contain an optional IEEE 802.1Q tag to identify the frames priority and VLAN. The tag is inserted after the source field and itself is comprised of two 16 bit fields, TPID and TCI. It is present if the bytes that would contain the TPID field have a hexadecimal value of 8100. Otherwise these bytes contain the EtherType field.

Lastly, the Data field will usually contain higher-level protocols. Which protocol is contained and how the payload is to be interpreted depends on the value of EtherType. With our naive approach above, we have to manually convert Data into the correct structured message format. Without tool support, this conversion will be another source of errors.

## Formal Specification with RecordFlux

Next, we'll specify the Ethernet frame format using the RecordFlux domain specific language and demonstrate how the specification is used to create real-world parsers. RecordFlux, deliberately has a syntax similar to SPARK, but deviates where more expressiveness was required to specify complex message formats.

### Packages and Types

Just like SPARK, entities are grouped into packages. By convention, a package contains one protocol like IPv4, UDP or TLS. A protocol will typically define many types and different message formats. Range types and modular types are identical to those found in SPARK. Just like in SPARK, entities can be qualified using aspects, e.g. to specify the size of a type using the Size aspect:

package Ethernet is

type Type_Length is range 46 .. 2**16 - 1 with Size => 16;
type TPID is range 16#8100# .. 16#8100# with Size => 16;
type TCI is mod 2**16;

end Ethernet;

The first difference to SPARK is the message keyword which is similar to records, but has important differences to support the non-linear structure of messages. The equivalent of the naive Ethernet specification in RecordFlux syntax would be:

type Simplified_Frame is
message
Type_Length : Type_Length;
end message;

### Graph Structure

As argued above, such a simple specification is insufficient to express the complex corner-cases found in Ethernet. Luckily, RecordFlux messages allow for expressing conditional, non-linear field layouts. While SPARK records are linear sequences of fixed-size fields, messages should rather be thought of as a graph of fields where the next field, its start position, its length and constraints imposed by the message format can be specified in terms of other message fields. To ensure that the parser generated by RecordFlux is deadlock-free and able to parse messages sequentially, conditionals must only reference preceding fields.

We can extend our simple example above to express the relation of the value of Type_Length and length of the payload field:

   ...
Type_Length : Type_Length
then Data
with Length => Type_Length * 8
if Type_Length <= 1500,
then Data
with Length => Message’Last - Type_Length’Last
if Type_Length >= 1536;
...

For a field, the optional then keyword defines the field to follow in the graph. If that keyword is missing, this defaults to the next field appearing in the message specification as in our Simplified_Frame example above. To have different fields follow under different conditions, an expression can be added using the if keyword. Furthermore, an aspect can be added using the with keyword, which can be used to conditionally alter properties like start or length of a field. If no succeeding field is specified for a condition, as for Type_Length in the range between 1501 and 1535, the message is considered invalid.

In the fragment above, we use the value of the field Type_Length as the length of the Data field if its value is less or equal to 1500 (IEEE 802.3 case). If Type_Length is greater or equal to 1536, we calculate the payload length by subtracting the end of the Type_Length field from the end of the message. The 'Last (and also 'First and 'Length) attribute is similar to the respective SPARK attribute, but refer to the bit position (or bit length) of a field within the message. The Message field is special and refers to the whole message being handled.

### Optional Fields

The graph structure described above can also be used to handle optional fields, as for the IEEE 802.1Q tag in Ethernet. Let's have a look at the full Ethernet specification first:

package Ethernet is
type Type_Length is range 46 .. 2**16 - 1 with Size => 16;
type TPID is range 16#8100# .. 16#8100# with Size => 16;
type TCI is mod 2**16;

type Frame is
message
Type_Length_TPID : Type_Length
then TPID
with First => Type_Length_TPID’First
if Type_Length_TPID = 16#8100#,
then Type_Length
with First => Type_Length_TPID’First
if Type_Length_TPID /= 16#8100#;
TPID : TPID;
TCI : TCI;
Type_Length : Type_Length
then Data
with Length => Type_Length * 8
if Type_Length <= 1500,
then Data
with Length => Message’Last - Type_Length’Last
if Type_Length >= 1536;
then null
if Data’Length / 8 >= 46
and Data’Length / 8 <= 1500;
end message;
end Ethernet;

Most concepts should look familiar by now. The null field used in the then expression of the Data field is just a way to state that the end of the message is expected. This way, we are able to express that the payload length must be between 46 and 1500. As there is only one then branch for payload (pointing to the end of the message), values outside this range will be considered invalid. This is the general pattern to express invariants that have to hold for a message.

How can this be used to model optional fields of a message? We just need to cleverly craft the conditions and overlay the following alternatives. The relevant section of the above Ethernet specification is the following:

   ...
Type_Length_TPID : Type_Length
then TPID
with First => Type_Length_TPID’First
if Type_Length_TPID = 16#8100#,
then Type_Length
with First => Type_Length_TPID’First
if Type_Length_TPID /= 16#8100#;
TPID : TPID;
TCI : TCI;
Type_Length : Type_Length
...

Remember that the optional IEEE 802.1Q tag consisting of the TPID and TCI fields is present after the Source only if the bytes that would contain the TPID field are equal to a hexadecimal value of 8100. We introduce a field Type_Length_TPID only for the purpose of checking whether this is the case. To avoid any confusion when using the parser later, we will overlay this field with properly named fields. If Type_Length_TPID equals 16#8100# (SPARK-style numerals are supported in RecordFlux), we define the next field to be TPID and set its first bit to the 'First attribute of the Type_Length_TPID field. If Type_Length_TPID does not equal 16#8100#, the next field is Type_Length, skipping TPID and TCI.

As stated above, the specification actually is a graph with conditions on its edges. Here is an equivalent graph representation of the full Ethernet specification:

# Working with RecordFlux

RecordFlux comes with the command line tool rflx which parses specification files, transforms them into an internal representation and subsequently generates SPARK packages that can be used to parse the specified messages:

To validate specification files, which conventionally have the file ending .rflx, run RecordFlux in check mode. Note, that RecordFlux does not support search paths at the moment and all files need to be passed on the command line:

$rflx check ethernet.rflx Parsing ethernet.rflx... OK ## Code Generation Code is generated with the generate subcommand which expects a list of RecordFlux specifications and an output directory for the generated SPARK sources. Optionally, a root package can be specified for the generated code using the -p switch: $ rflx generate -p Example ethernet.rflx outdir
Parsing ethernet.rflx... OK
Generating... OK
Created outdir/example-scalar_sequence.adb

## Usage

To use the generated code, we need to implement a simple main program. It allocates a buffer to hold the received Ethernet packet and initializes the parser context with the pointer to that buffer. After verifying the message and checking whether it has the correct format, its content can be processed by the application.

with Ada.Text_IO; use Ada.Text_IO;
with Example.Ethernet.Frame;
with Example.Types;

procedure Ethernet
is
package Eth renames Example.Ethernet.Frame;
subtype Packet is Example.Types.Bytes (1 .. 1500);
Buffer : Example.Types.Bytes_Ptr := new Packet'(Packet'Range => 0);
Ctx    : Eth.Context := Eth.Create;
begin
-- Retrieve the packet
Buffer (1 .. 56) :=
(16#ff#, 16#ff#, 16#ff#, 16#ff#, 16#ff#, 16#ff#, 16#04#, 16#d3#,
16#b0#, 16#ab#, 16#f9#, 16#31#, 16#08#, 16#06#, 16#00#, 16#01#,
16#08#, 16#00#, 16#06#, 16#04#, 16#00#, 16#01#, 16#04#, 16#d3#,
16#b0#, 16#9c#, 16#79#, 16#53#, 16#ac#, 16#12#, 16#fe#, 16#b5#,
16#00#, 16#00#, 16#00#, 16#00#, 16#00#, 16#00#, 16#ac#, 16#12#,
16#64#, 16#6d#, 16#00#, 16#00#, 16#00#, 16#00#, 16#00#, 16#00#,
16#00#, 16#00#, 16#00#, 16#00#, 16#00#, 16#00#, 16#00#, 16#00#);

Eth.Initialize (Ctx, Buffer);
Eth.Verify_Message (Ctx);
if Eth.Structural_Valid_Message (Ctx) then
Put_Line ("Source: " & Eth.Get_Source (Ctx)'Img);
if Eth.Present (Ctx, Eth.F_TCI) and then Eth.Valid (Ctx, Eth.F_TCI)
then
Put_Line ("TCI: " & Eth.Get_TCI (Ctx)'Img);
end if;
end if;
end Ethernet;

The code can be proven directly using gnatprove (the remaining warning is a known issue in GNAT Community 2019 as explained in the Known Issues section of the README):

$gnatprove -P ethernet.gpr Phase 1 of 2: generation of Global contracts ... Phase 2 of 2: flow analysis and proof ... ethernet.adb:22:25: warning: unused assignment to "Buffer" gnatprove: error during flow analysis and proof What guarantees can we obtain from this proof? Firstly, the absence of runtime errors is shown for the generated code as well as for the user code. No matter what input is read into the Buffer variable and presented to the parser, the program does not raise an exception at runtime. Furthermore, its control flow cannot be circumvented e.g. by buffer overflows or integer overflows. This is called "silver level" in SPARK parlance. Additionally, we prove key integrity properties ("gold level"), e.g. that optional fields are accessed if and only if all requirements defined in the RecordFlux specification are met. # Case Studies RecordFlux greatly eases the specification and handling of binary messages. But is it suitable for real-world applications? We conducted a number of case studies to validate that it in fact is. ## Packet Sniffer Packet sniffers are tools often used by administrators to diagnose network problems. They capture and dissect all data received on a network interface to allow for a structured analysis of packet content. Famous open source examples are Wireshark and tcpdump. As packet sniffers need to handle a large amount of complex protocols, they also tend to be complex. There have been errors which allowed attackers to mount remote exploits against packet sniffers. For this reason, e.g. it is discouraged to run Wireshark under a privileged user account. Obviously a formally verified packet sniffer is desirable to eliminate the risk of an attack when analyzing traffic from untrusted sources. We prototyped a very simple packet sniffer for IP/UDP on Linux for which we proved the absence of runtime errors. The output is similar to other packet sniffers: $ sudo ./obj/sniff_udp_in_ip

IP: Version: 4 IHL: 5 DSCP: 0 ECN: 0 TLen: 53 Id: 9059 DF: 1 MF: 0
FOff: 0 TTL: 64 Proto: UDP HCSum: 6483 Src: 127.0.0.1 Dst: 127.0.0.1,
UDP: SPort: 58423 DPort: 53 Len: 33 CSum: 65076 Payload: b9 7d 01
00 00 01 00 00 00 00 00 00 04 63 6f 61 70 02 6d 65 00 00 1c 00 01

## TLS 1.3

Another area where correct parsers are essential are security protocols. The most important security protocol on the internet is TLS - whenever a browser connects to a remote server using the https protocol, in the background some version of TLS is used. We formalized the message format of the latest TLS version 1.3 according to RFC 8446 and generated a SPARK parser for TLS from the specification.

An open question remained, though: Can the generated code handle real-world TLS traffic and is there a performance penalty compared to unverified implementations? While we are working on a verified component-based TLS 1.3 implementation completely done in SPARK, it is not yet available for this experiment. As an alternative, we used an open source TLS 1.3 library by Facebook named Fizz and replaced its TLS parser by our generated code. As the C++ types used by Fizz (e.g. vectors) could not easily be bound to Ada, glue code had to be written manually to translate between the C++ and the SPARK world. We ensured that all untrusted data the library comes in touch with is handled by SPARK code. For the SPARK part, we proved the absence of runtime errors and the invariants stated in the specification.

Our constructive approach turned out to be effective for improving the security of existing software. In CVE-2019-3560 an integer overflow was found in the Fizz library. Just by sending a short, specially crafted sequence of messages, an attacker could mount a Denial of Service attack against an application using Fizz by putting it into an infinite loop. While Facebook fixed this bug by using a bigger integer type, the RecordFlux parser prevents this issue by rejecting packets with invalid length fields.

Despite the required transformations, the performance overhead was surprisingly low. For the TLS handshake layer - the part that negotiates cryptographic keys when the communication starts - the throughput was 2.7% lower than for the original version. For the TLS record layer, which encrypts and decrypts packets when a connection is active, the throughput was only 1.1% lower:

# Conclusion and Further Information

With RecordFlux, creating SPARK code that handles complex binary data has become a lot easier. With a proper specification the generated code can often be proven automatically. In the future, we will extend RecordFlux to support the generation of binary messages and the modeling of the dynamic behavior of binary protocols.

For more information see our research paper and the language reference. If you have comments, found a bug or have suggestions for enhancements, feel free to open an issue on GitHub or write an email.

]]>
The Power of Technology Integration and Open Source https://blog.adacore.com/the-power-of-technology-integration Tue, 15 Oct 2019 12:04:00 +0000 Arnaud Charlet https://blog.adacore.com/the-power-of-technology-integration

Part of our core expertise at AdaCore is to integrate multiple technologies as smoothly as possible and make it a product. This started at the very beginning of our company by integrating a code generator (GCC) with an Ada front-end (GNAT) which was then followed by integrating a debugger engine (GDB) and led to today's rich GNAT Pro offering.

Today we are going much further in this area and I am going to give you a few concrete examples in this post.

For example take our advanced static analysis engine CodePeer and let's look at it from two different angles (a bit like bottom-up and top-down if you will): what does it integrate, and what other products integrate it?

From the first perspective, CodePeer integrates many different and complex pieces of technology: the GNAT front-end, various "core" GNAT tools (including GNATmetric, GNATcheck, GNATpp), the AdaCore IDEs (GNAT Studio, GNATbench), GNATdashboard, a Jenkins plug-in, GPRbuild, as well as the codepeer engine itself and finally as of version 20, libadalang and light checkers based on libadalang (aka "LAL checkers").

Thanks to this complex integration, CodePeer users can launch various tools automatically and get findings stored in a common database, and get a common user interface to drive it. Indeed, from CodePeer you can get access to: GNAT warnings, GNATcheck messages, LAL checkers, CodePeer "advanced static analysis" messages.

And the list will continue growing in the future, but that's for another set of posts!

Now from the other perspective, the codepeer engine is also integrated as an additional prover in our SPARK Pro product, to complement the SMT solvers integrated in SPARK, and is also used in our QGen product as the back-end of the QGen Verifier which performs static analysis on Simulink models.

Speaking of SPARK, this is also another good example of complex integration of many different technologies: the GNAT front-end, GPRbuild, GNAT Studio and GNATbench, GNATdashboard, Why3, CVC4, Z3, Alt-Ergo, the codepeer engine, and the SPARK engine itself.

Many of these components are developed in house at AdaCore, and many other components are developed primarily outside AdaCore (such as GCC, GDB, Why3, CVC4, ...). Such complex integration is only possible because all these components are Open Source and precisely allow this kind of combination. Add on top of that AdaCore's expertise in such integration and productization, and you get our ever growing offering!

]]>
Learning SPARK via Conway's Game of Life https://blog.adacore.com/learning-spark Thu, 10 Oct 2019 11:42:21 +0000 Michael Frank https://blog.adacore.com/learning-spark

I began programming in Ada in 1990 but, like many users, it took me a while to become a fan of the language. A brief interlude with Pascal in the early 90’s gave me a better appreciation of the strong-typing aspects of Ada, but continued usage helped me learn many of the other selling points.

I’ve recently started working at AdaCore as a Mentor, which means part of my job is to help companies transition from other languages to Ada and/or SPARK. At my last job, the development environment was mixed C++ and Ada (on multiple platforms) so, to simplify things, we were coding in Ada95. Some of our customers’ code involved Ada2005 and Ada2012 constructs, so I did learn the new language variants and see the benefits, but I hadn’t actually written anything using contracts or expression functions or the like.

We all know the best way to learn is to do, so I needed to come up with something that would require me to increase my knowledge not only of the Ada language, but also to get some more experience with the AdaCore tools (GPS, CodePeer) as well as learning how to prove the correctness of my SPARK code. So I settled on Conway’s Game of Life (Wikipedia). This is an algorithmic model of cell growth on a 2-dimensional grid following four simple rules about cell life and death:

• A live cell with fewer than two live neighbors dies (underpopulation)
• A live cell with two or three live neighbors stays alive (life)
• A live cell with more than three live neighbors dies (overpopulation)
• A dead cell with exactly three live neighbors becomes a live cell (birth)

This little application gave me everything I needed to experiment with, and even a little more. Obviously, this was going to be written in Ada following SPARK rules, so I could program using the latest language variant. The development environment was going to be AdaCore’s GPS Integrated Development environment, and I could use CodePeer for static analysis on the code I was writing, so I could get some experience with these tools. I then could use the SPARK provers to prove that my code was correct. In addition, as Life has a visual aspect, I was able to do a little work with GtkAda to make a graphical representation of the algorithm.

I started with the lower level utilities needed to perform the algorithm – counting the number of live neighbors. Easy, right? Yes, it is easy to code, but not so easy to prove using the SPARK provers. If I wasn’t worried about provability, I could just write a simple doubly-nested loop that would iterate over rows and columns around the target cell, and count how many neighboring cells are alive.

The issue is that, with loops in SPARK, we need to “remind” SPARK what has happened in previous loop iterations (typically using “pragma Loop_Invariant”) so it has all the knowledge it needs to prove the current iteration. This issue is made more difficult when we use nested loops. So, I rewrote my “count all neighbors” routine into two routines - “count neighbors in a row” and “total neighbors in all rows”. Each routine has a single loop, which made specifying the loop invariant much easier.

And that “rewriting” helped reinforce my understanding of “good coding practices”. Most of us were taught that high-complexity subroutines were problematic because they were difficult to test. So we write code that tries to keep the complexity down. The next level in safety-critical software – provability – is also made more difficult by higher complexity subroutines. In my case, my counting routines needed to check to see if a cell was alive and, if so, increment the count. That’s just a simple “if” statement. But, to make it easier on the prover, I created a new routine that would return 1 if the cell was alive and 0 otherwise. Very simple, and therefore very easy to prove correctness. Now, my counting routine could just iterate over the cells, and sum the calls to the function – once again, easier to prove.

Once the low-level functions were created the “real” coding began. And in trying to write pre- and post-conditions for higher level algorithms, I found a great help in the CodePeer tool. This tool performs static analysis on the code base, from simple and quick to exhaustive and slow. The correlation between this tool and the prover tools is that, if I was having a problem writing a valid postcondition, more often than not a CodePeer report would show some warning indicating my preconditions were not complete. For example, every time you add numbers, CodePeer would remind you that a precondition would have to be set to ensure no overflow. Without that warning, everything looks fine, but the prover will not be able to validate the routine. In addition, when using GPS, CodePeer will insert annotations into the Source Code View detailing many of the aspects of each subprogram. These annotations typically include pre- and post-conditions, making it easier to determine which of these aspects could be written into the code to make proving easier.

So, a development cycle was created. First, write a subprogram to “do something” (usually this means writing other subprograms to help!) Next, run CodePeer to perform static analysis. In addition to finding run-time errors, the annotations would help determine pre-conditions for the routines. I would study these annotations, and, for each pre- and post-condition, I had to determine if it made sense or not. If the condition did not make sense, I needed to modify my subprogram to get the correct results. If the condition made sense, then I needed to decide if I wanted to encode that condition using aspects, or just ignore – because sometimes the annotations were more detailed than I needed. Once I got past CodePeer analysis, I would try to prove the subprograms I created. If they could be proven, I could go onto the next step in building my application. If not, I needed to modify my code and start the cycle over.

This process is basically a one-person Code Review! In a large software project, these steps would be the same on some portion of the code, and only then would the code go into a peer review process – with the benefit that the reviewers are only concerned with design issues, and don’t have to be worried about dealing with run-time errors or verifying that the implementation was correct. So, not only did I achieve my objective of learning aspects of the languages and the tools, I got first-hand evidence of the benefits of writing software “the right way!”

To access the source code for this project, please checkout my github repository here: https://github.com/frank-at-adacore/game_of_life

]]>
Pointer Based Data-Structures in SPARK https://blog.adacore.com/pointer-based-data-structures-in-spark Tue, 08 Oct 2019 12:19:34 +0000 Claire Dross https://blog.adacore.com/pointer-based-data-structures-in-spark

As seen in a previous post, it is possible to use pointers (or access types) in SPARK provided the program abides by a strict memory ownership policy designed to prevent aliasing. In this post, we will demonstrate how to define pointer-based data-structures in SPARK and how to traverse them without breaking the ownership policy.

Pointer-based data structures can be defined in SPARK as long as they do not introduce cycles. For example, singly-linked lists and trees are supported whereas doubly-linked lists are not. As an example for this post, I have chosen to use a map encoded as a singly-linked list of pairs of a key and a value. To define a recursive data structure in Ada, you need to use an access type. This is what I did here: I declared an incomplete view of the type Map, and then an access type Map_Acc designating this view. Then, I could give the complete view of Map as a record with a component Next of type Map_Acc:

   type Map;
type Map_Acc is access Map;
type Element_Acc is not null access Element;
type Map is record
Key   : Positive;
Value : Element_Acc;
Next  : Map_Acc;
end record;

Note that I also used an access type for the element that is stored in the map (and not for the key). Indeed, we assume here that the elements stored in the map can be big, so that we may want to modify them in place rather than copying them. We will see later how this can be done.

To be able to write specifications on my maps, I need to define some basic properties for them. To describe precisely my maps, I will need to speak about:

• whether there is a mapping for a given key in my map, and
• the value associated to a key by the map, knowing that there is a mapping for this key.

These concepts are encoded as the following functions:

   function Model_Contains (M : access constant Map; K : Positive) return Boolean;

function Model_Value (M : access constant Map; K : Positive) return Element
with  Pre => Model_Contains (M, K);

Model_Contains should return True when there is an occurrence of the key K in the map M, and Model_Value should retrieve the element associated with the first occurrence of K in M.

Now, I need to give a definition to my functions. The natural way to express the meaning of properties on linked data-structures is through recursion. This is what I have done here for both Model_Contains and Model_Value:

   function Model_Contains (M : access constant Map; K : Positive) return Boolean is
(M /= null and then (M.Key = K or else Model_Contains (M.Next, K)));
--  A key K has a mapping in a map M if either K is the key of the first mapping of M or if
--  K has a mapping in M.Next.

function Model_Value (M : access constant Map; K : Positive) return Element is
(if M.Key = K then M.Value.all else Model_Value (M.Next, K));
--  The value mapped to key K by a map M is either the value of the first mapping of M if
--  K is the key this mapping or the value mapped to K in M.Next otherwise.

Note that we do not need to care about cases where there is no mapping for K in M in the definition of Model_Value, as we have put as a precondition that it cannot be used in such cases.

As these functions are recursive, GNATprove cannot determine that they will terminate. As a result, it will not use their definition in some cases, lest they may be incorrect. To avoid this problem, I have added Terminating annotations to their declarations. These annotations cannot be verified by the tool, but they can easily be discharged by review, so I accepted the check messages with appropriate justification. I have also marked the two functions as ghost, as I will assume that, for efficiency reasons or to control the stack usage, we don't want to use recursive functions in normal code:

   function Model_Contains (M : access constant Map; K : Positive) return Boolean
with
Ghost,
Annotate => (GNATprove, Terminating);
pragma Annotate
(GNATprove, False_Positive,
"subprogram ""Model_Contains"" might not terminate",
"Recursive calls occur on strictly smaller structure");

function Model_Value (M : access constant Map; K : Positive) return Element
with
Ghost,
Annotate => (GNATprove, Terminating),
Pre => Model_Contains (M, K);
pragma Annotate
(GNATprove, False_Positive,
"subprogram ""Model_Value"" might not terminate",
"Recursive calls occur on strictly smaller structure");

Now that we have defined our specification properties for maps, let's try to implement some functionality. We will start with the easiest one, a Contains function which checks whether there is mapping for a given key in a map. In terms of functionality, it should really return the same value as Model_Contains. However, implementation-wise, we would like to use a loop to avoid introducing recursion. Here, we need to be careful. Indeed, traversing a linked data-structure using a loop generally involves an alias between the traversed structure and the pointer used for the traversal. SPARK supports this use case through the concepts of local observers and borrowers (borrowed from Rust's regular and mutable borrows). What we need here is an observer. Intuitively, it is a local variable which is granted for the duration of its lifetime read-only access to a component of another object. When the observer stays in scope, the actual owner of the object also has read-only access on the object. When the observed goes out of scope, it regains complete ownership (provided it used to have it, and there are no other observers in scope). Let us see how we can use it in our example:

   function Contains (M : access constant Map; K : Positive) return Boolean with
Post => Contains'Result = Model_Contains (M, K);

function Contains (M : access constant Map; K : Positive) return Boolean is
C : access constant Map := M;
begin
while C /= null loop
pragma Loop_Invariant (Model_Contains (C, K) = Model_Contains (M, K));
if C.Key = K then
return True;
end if;
C := C.Next;
end loop;
return False;
end Contains;

Because it is declared with an anonymous access-to-constant type, the assignment to C at its declaration is not considered to be a move, but an observe. In the body of Contains, C will be a read-only alias of M. M itself will retain read-only access to the data it designates, and will regain full ownership at the end of Contains (note that here, it does not change much, as M itself is an access-to-constant type, so it has read-only access to its designated data to begin with).

In the body of Contains, we use a loop which searches for a key equal to K in M. We see that, even though C is an observer, we can still assign to it. It is because it is the data designated by C which is read-only, and not C itself. When we assign directly into C, we do not modify the underlying structure, we simply modify the handle. Note that, as usual with loops in SPARK, I had to provide a loop invariant to help GNATprove verify my code. Here it states that K is contained in M, if and only if, it is contained in C. This is true because we have only traversed values different from K until now. We see that, in the invariant, we are allowed to mention M. This is because M retains read-only access to the data it designated during the observe.

Let's now write a function to retrieve in normal code the value associated to a key in a map. Since elements can be big, we don't want to copy them, so we should not return the actual value, but rather a pointer to the object stored in the map. For now, let's assume that we are interested in a read-only access to the element. As we have seen above, in the ownership model of SPARK, a read-only access inside an existing data-structure is an observer. So here, we want a function which computes and returns an observer of the input data-structure. This is supported in SPARK, provided the traversed data-structure is itself an access type, using what we call "traversal functions". An observing traversal function takes an access type as its first parameter and returns an anonymous access-to-constant object which should be a component of this first parameter.

   function Constant_Access (M : access constant Map; K : Positive) return not null access constant Element
with
Pre  => Model_Contains (M, K),
Post => Model_Value (M, K) = Constant_Access'Result.all;

function Constant_Access (M : access constant Map; K : Positive) return not null access constant Element
is
C : access constant Map := M;
begin
while C /= null loop
pragma Loop_Invariant (Model_Contains (C, K) = Model_Contains (M, K));
pragma Loop_Invariant (Model_Value (C, K) = Model_Value (M, K));
if C.Key = K then
return C.Value;
end if;
C := C.Next;
end loop;
raise Program_Error;
end Constant_Access;

The return type of Constant_Access is rather verbose. It states that it computes an anonymous access-to-constant object (an observer in SPARK) which is not allowed to be null. Indeed, since we know that we have a mapping for K in M when calling Constant_Access, we are sure that there will always be an element to return. This also explains why we have a raise statement as the last statement of Constant_Access: GNATprove is able to prove that this statement is unreachable, and we need that statement to confirm that the bottom of the function cannot be reached without returning a value.

The contract of Constant_Access is straightforward, as we are again reimplementing a concept that we already had in the specification (Constant_Access returns a pointer to the result of Model_Value). In the body of Constant_Access, we create a local variable C which observes the data-structure designated by M, just like we did for Contains. When the correct mapping is found, we return an access to the corresponding value in the data-structure. GNATprove will make sure that the structure returned is a part of the first parameter (here M) of the function.

Note that this function does not breach the ownership policy of SPARK as what is computed by the Constant_Access is an observer, so it will not be possible to assign it to a component inside another data-structure for example.

Now let's assume that we want to modify the value associated to a given key in the data-structure. We can provide a Replace_Element procedure which takes a map M, a key K, and an element V and replaces the value associated to K in M by V. We will write in its postcondition that the value associated to K in M after the call is V (this is not complete, as we do not say anything about other mappings, but for the sake of the explanation, let's stick to something simple).

   procedure Replace_Element (M : access Map; K : Positive; V : Element) with
Pre  => Model_Contains (M, K),
Post => Model_Contains (M, K) and then Model_Value (M, K) = V;

In its body, we want to loop, find the matching pair, and replace the element by the new value V. Here we cannot use an observer to search for the key K, as we want to modify the mapping afterward. Instead, we will use a local borrower. Just like an observer, a borrower takes the ownership of a part of an existing data-structure for the duration of its life-time, but it takes full ownership, in the sense that a borrow has the right to both read and modify the borrowed object. While the borrower is in scope, the borrowed object cannot be accessed directly (there is a provision for reading it in the RM that is used by the tool in particular cases, we will see later). At the end of the scope of the borrower, the ownership returns to the borrowed object. A borrower in SPARK is introduced by the declaration of an object of an anonymous access-to-variable type (note the use of "access Map" instead of "access constant Map"):

   procedure Replace_Element (M : access Map; K : Positive; V : Element) is
X : access Map := M;
begin
while X /= null loop
pragma Loop_Invariant (Model_Contains (X, K));
pragma Loop_Invariant
(Pledge (X, (if Model_Contains (X, K) then Model_Contains (M, K)
and then Model_Value (M, K) = Model_Value (X, K))));

if X.Key = K then
X.Value.all := V;
return;
end if;
X := X.Next;
end loop;
end Replace_Element;

The body of Replace_Element is similar to the body of Contains, except that we modify the borrower before returning from the procedure. However, we see that the loop invariant is more involved. Indeed, as we are modifying M using X, we need to know how the modification of X will affect M. Usually, GNATprove can track this information without help, but when loops are involved, this information needs to be supplied in the loop invariant. To describe how a structure and its borrower are affected, I have used a pledge. The notion of pledges was introduced by researchers from ETH Zurich to verify Rust programs (see the preprint of their work to be published at OOPSLA this year). Conceptually, a pledge is a property which will always hold between the borrowed object and the borrower, no matter the modifications that are made to the borrower. As pledges are not yet supported at a language level in SPARK, it is possible to mark (a part of) an assertion as a pledge by using an expression function which is annotated with a Pledge Annotate pragma:

   function Pledge (Borrower : access constant Map; Prop : Boolean) return Boolean is
(Prop)
with Ghost,
Annotate => (GNATProve, Pledge);
--  Pledge for maps

Note that the name of the function could be something other than "Pledge", but the annotation should use the string "Pledge". A pledge function is a ghost expression function which takes a borrower and a property and simply returns the property. When GNATprove encounters a call to such a function, it knows that the property given as a second parameter to the call must be handled as a pledge of the local borrower given as a first parameter. It will attempt to verify that, no matter the modification which may be done to Borrower, the property will still hold. Inside Prop, it is necessary to be able to mention the borrowed object. This is why there is provision for reading it in the SPARK RM, and in fact, it is the only case where the tool will allow a borrowed object to be mentioned.

In our example, the pledge of X states that, no matter how X will be modified afterward, if it happens that X has a mapping for K, then M will have the same mapping for K. This is true because we have not encountered K in the previous iterations of the loop, so if we find a mapping in X for K, it will be the first such mapping in M too. Note that a more precise pledge, like Pledge (X, Model_Contains (X, K)), would not be correct. Given a borrower aliasing the root of a subtree in a borrowed object, the pledge relationship expresses what necessarily holds for that subtree, independently of any modifications that may occur to it through the borrower. This is why we need to state the pledge here with an if-expression: "if the borrower X still contains the key K, then M necessarily contains the key K, and both agree on the associated value".

After the loop, M has been modified through X. GNATprove does not know anything about it but what can be deduced from the current value of X and information about the relation between M and X supplied by the loop-invariants that contain pledges. These invariants don't completely define the relation between X and M, but they give enough information to deduce the postcondition when a mapping for K is found in X. Since at the last iteration X.Key = K, GNATprove can deduce that Model_Contains (X, K) and Model_Value (X, K) = V holds after the loop. Using the pledges from the loop-invariants, it can infer that we also have Model_Contains (M, K) and Model_Value(M, K) = V.

Before we reach the end of this post, we will go one step further. We now have a way to replace an element with another one in the map. It can be used if we want to do a modification inside an element of the map, but it won't be efficient, as the element will need to be copied. We would like to provide a way to find the value associated to a given key in a map, and return an access to it so that it can be modified in-place. As for Constant_Reference, this can be done using a traversal function:

   function Pledge (Borrower : access constant Element; Prop : Boolean) return Boolean is
(Prop)
with Ghost,
Annotate => (GNATProve, Pledge);
--  Pledge for elements

function Reference (M : access Map; K : Positive) return not null access Element
with
Pre  => Model_Contains (M, K),
Post => Model_Value (M, K) = Reference'Result.all and then
Pledge (Reference'Result, Model_Contains (M, K) and then
Model_Value (M, K) = Reference'Result.all);

Reference returns a mutable access inside the data-structure M. We can see that, in its postcondition, I have added a pledge. Indeed, since the result of Reference is an access-to-variable object, a user of my function can use it to modify M. If I want the tool to be able to deduce anything about such a modification, I need to describe the link between the result of a call to the Reference function, and its first parameter. Here my pledge gives the same information as the postcondition of Replace_Element, that is, that the value designated by the result of the call will be the one mapped from K in M.

   function Reference (M : access Map; K : Positive) return not null access Element
is
X : access Map := M;
begin
while X /= null loop
pragma Loop_Invariant (Model_Contains (X, K));
pragma Loop_Invariant (Model_Value (X, K) = Model_Value (M, K));
pragma Loop_Invariant
(Pledge (X, (if Model_Contains (X, K) then Model_Contains (M, K)
and then Model_Value (M, K) = Model_Value (X, K))));

if X.Key = K then
return X.Value;
end if;
X := X.Next;
end loop;
raise Program_Error;
end Reference;


The body of Reference contains the same loop as Constant_Reference, except that I have added a loop invariant, similar to the one in  Replace_Element to supply a pledge to X.

The specification and verification of pointer-based data-structures is a challenge in deductive verification whether the "pointer" part is implemented as a machine pointer or as an array index (as an example, see our previous post about verifying insertion inside a red-black tree). In addition, SPARK has a strict ownership policy which will prevent completely the use of some (doubly-linked) data-structures, and complicate the writing of usual algorithms on others. However, I think I have demonstrated in this post that is still feasible to write and verify in SPARK some of these algorithms, with comparatively few user-supplied annotations.

]]>
Combining GNAT with LLVM https://blog.adacore.com/combining-gnat-with-llvm Tue, 01 Oct 2019 12:24:29 +0000 Arnaud Charlet https://blog.adacore.com/combining-gnat-with-llvm

## Presenting the GNAT LLVM project

At AdaCore labs, we have been working for some time now on combining the GNAT Ada front-end with a different code generator than GCC.

The GNAT front-end is particularly well suited for this kind of exercise.  Indeed, we've already plugged many different code generators into GNAT in the past, including a Java byte code generator (the old "JGNAT" product for those who remember it), a .NET byte code generator derived from the Java one, a Why3 generator, used at the heart of the SPARK formal verification technology, a SCIL (Statically Checkable Intermediate Language) generator used in our advanced static analyzer CodePeer, and a C back-end used in GNAT CCG (Common Code Generator).

This time, we're looking at another general purpose code generator, called LLVM, in order to expand the outreach of Ada to the LLVM ecosystem (be it the compiler itself or other components such as static analysis tools).

This work-in-progress research project is called "GNAT LLVM" and is meant to show the feasibility of generating LLVM bytecode for Ada and to open the LLVM ecosystem to Ada, including tools such as KLEE, that we are also planning to work with and add Ada support for. Note that we are not planning on replacing any existing GNAT port based on GCC, so this project goes in addition rather than in replacement.

## Technical Approach

We decided on a "pure" LLVM approach that's as easy to integrate and fit into the LLVM ecosystem as possible, using the existing LLVM API directly, while at the same time doing what we do best: write Ada code! So we are using the LLVM "C" API and generate automatically Ada bindings via the GCC -fdump-ada-spec switch and a bit of postprocessing done in a python script, that we can then call directly from Ada, which allows us to both easily traverse the GNAT tree and generate LLVM instructions, all in Ada.

By the way, if you know about the DragonEgg project then a natural question would be "why are you starting a GNAT LLVM project from scratch instead of building on top of DragonEgg?". If you want to know the answer, check the file README.dragonegg in the repository!

## Next Steps

We have just published the GNAT LLVM tool sources licensed under GPLv3 on GitHub for hobbyists and researchers to experiment with it and give us their feedback.

If you are interested, give it a try, and let us know via Issues and Pull Requests on GitHub how it works for you, or by leaving comments at the bottom of this page! One exciting experiment would be to compile Ada code using the webassembly LLVM back-end for instance!

]]>

AdaCore’s fourth annual Make with Ada competition launched this week with over $8K in cash and prizes to be awarded for the most innovative embedded systems projects developed using Ada and/or SPARK. The contest runs from September 10, 2019, to January 31, 2020, and participants can register on the Hackster.io developer platform here. ## What’s new? Based on feedback from previous participants, we changed the competition evaluation criteria so projects will now be judged on: • Software quality - Does the software meet its requirements?; • Openness - Is the project open source?; and • “Buzz factor” - Does it have the wow effect to appeal to the software community? Further information about the judging criteria is available here We’ve also increased the amount of prizes to give more projects a chance to win: • One First Prize, in the amount of 2000 (two thousand) USD • Ten Finalist Prizes, in the amount of 600 (six hundred) USD each • One Student-only Prize (an Analog Discovery 2 Pro Bundle worth 299.99 USD) will go to the best-ranking student finalist. A project submitted by a student is eligible for both the Student-only Prize and the cash prizes. Award winners will be announced in March 2020 and project submissions will be evaluated by a judging panel consisting of Bill Wong, Senior Technology Editor at Electronic Design, and Fabien Chouteau, AdaCore software engineer, and author of the Make with Ada blog post series. #### Don’t forget that the new and enhanced GNAT Community 2019 is also available for download for use in your projects! ]]> First Ada Virtual Conference organized by and for the Ada community https://blog.adacore.com/ada-virtual-conferences Thu, 05 Sep 2019 13:34:25 +0000 Maxim Reznik https://blog.adacore.com/ada-virtual-conferences The Ada Community gathered recently around a new exciting initiative - an Ada Virtual Conference, to present Ada-related topics in a 100% remote environment. The first such conference took place on August, 10th 2019, around the topic of the new features in Ada 202x. The conference took the form of a video/audio chat based on the open source platform jitsi.org. No registration required, just access over a web browser or mobile application, with the possibility to participate anonymously. The presentation is just short of 25 minutes and is available on YouTube, Vimeo or via DropBox. Note that while the talk presents the current draft for Ada 202x, some features are still in discussion and may not make it to the standard, or with a different syntax or set of rules. That's in particular the case for all parallelism-related features and iterators (that is, up to slide 16 in the talk) which are being revisited by a group of people within AdaCore, in order to submit recommendations to the Ada Rapporteur Group next year. As the talk concludes, please contribute to the new Ada/SPARK RFCs website if you have ideas about the future of the language! Now the Ada Virtual Conference has a dedicated website, where anyone can vote for the topic of the next event. We invite you to participate! Image by Tomasz Mikołajczyk from Pixabay. ]]> Secure Use of Cryptographic Libraries: SPARK Binding for Libsodium https://blog.adacore.com/secure-use-of-cryptographic-libraries-spark-binding-for-libsodium Tue, 03 Sep 2019 11:54:00 +0000 Isabelle Vialard https://blog.adacore.com/secure-use-of-cryptographic-libraries-spark-binding-for-libsodium The challenge faced by cryptography APIs is to make building functional and secure programs easy for the user. Even with good documentation and examples, this remains a challenge, especially because incorrect use is still possible. I made bindings for two C cryptography libraries, TweetNaCl (pronounce Tweetsalt) and Libsodium, with the goal of making this binding easier to use than the original API by making it possible to detect automatically a large set of incorrect uses. In order to do this, I did two bindings for each library: a low-level binding in Ada, and a higher level one in SPARK, which I call the interface. I used Ada strong-typing characteristics and SPARK proofs to enforce a safe and functional use of the subprograms in the library. In this post I will explain the steps I took to create these bindings, and how to use them. ## Steps to create a binding I will use one program as example: Crypto_Box_Easy from Libsodium, a procedure which encrypts a message. At first I generated a binding using the Ada spec dump compiler: gcc -c -fdump-ada-spec -C ./sodium.h Which gives me that function declaration:  function crypto_box_easy (c : access unsigned_char; m : access unsigned_char; mlen : Extensions.unsigned_long_long; n : access unsigned_char; pk : access unsigned_char; sk : access unsigned_char) return int -- ./sodium/crypto_box.h:61 with Import => True, Convention => C, External_Name => "crypto_box_easy";  Then I modified this binding: First I changed the types used and I added in and out parameters. I removed the access parameter. Scalar parameters with out mode are passed using a temporary pointer on the C side, so it works even without explicit pointers. For unconstrained arrays like Block8 it is more complex. In Ada unconstrained arrays are represented by what is called fat pointers, that is to say a pointer to the bounds of the array and the pointer to the first element of the array. In C the expected parameter is a pointer to the first element of the array. So a simple binding like this one should not work. What saves the situation is this line which forces passing directly the pointer to the first element: with Convention => C; Thus we go from a low-level language where the memory is indexed by pointers to a typed language like Ada:  function crypto_box_easy (c : out Block8; m : in Block8; mlen : in Uint64; n : in out Block8; pk : in Block8; sk : in Block8) return int -- ./sodium/crypto_box.h:61 with Import => True, Convention => C, External_Name => "crypto_box_easy"; After this, I created an interface in SPARK that uses this binding: the goal is to make the same program as in the binding with some modifications. Some useless parameters will be deleted. For instance the C program often asks for an array and the length of this array (like m and mlen), which is useless since the length can be found with the attribute 'Length. Functions with out parameters must be changed into procedures to comply with SPARK rules. Finally new types can be created to take advantage of strong typing, as well as preconditions and postconditions.  procedure Crypto_Box_Easy (C : out Cipher_Text; M : in Plain_Text; N : in out Box_Nonce; PK : in Box_Public_Key; SK : in Box_Secret_Key) with Pre => C'Length = M'Length + Crypto_Box_MACBYTES and then Is_Signed (M) and then Never_Used_Yet (N);  ### How to use strong typing, and why Most of the parameters required by these programs are arrays. Some arrays for messages, others for key, etc. The type Block8 could be enough to represent them all: type Block8 is array (Index range <>) of uint8;  But then anyone could use a key as a message, or a message as a key. To avoid that, in Libsodium I derived new types from Block8. For instance, Box_Public_Key and Box_Secret_Key are the key types used by the Crypto_Box_* programs. The type for messages is Plain_Text, and the type for messages after encryption is Cipher_Text. Thus I take advantage of Ada's strong-typing characteristics in order to enforce the right use of the programs and their parameters. With TweetNaCl, I did things a bit differently: I created the different types directly in the binding, for the same result. Since TweetNaCl is a very small library, it was faster that way. In Libsodium I chose to let my first binding stay as close as possible to the generated one, and to focus on the interface where I use strong-typing and contracts (preconditions and postconditions). ### Preconditions and postconditions Preconditions and postconditions serve the same purpose as derived types: they enforce a specific use of the programs. There are two kinds of conditions: The first kind are mostly conditions on array's length. They are here to ensure the program will not fail. For instance: Pre => C'Length = M'Length + Crypto_Box_MACBYTES; This precondition says that the cipher text should be exactly Crypto_Box_MACBYTES bytes longer than the message that we want to encrypt. If this condition is not filled then execution will fail. Note that we can reference C'Length in the precondition, even though C is an out parameter, because the length attribute of an out parameter that is of an array type is available when the call is made, so we can reason about it in our precondition. The other kind of conditions is used to avoid an unsafe use of the programs. For instance, Crypto_Box uses a Nonce. A Nonce is a small array used as a complement to a key. In theory, to be safe, a key should be long, and used only for one message. However, it is costly to generate a new long key for each message. So we use a long key for every message with a Nonce which is different for each message, but easy to generate. Thus the encryption is safe, but only if we remember to use a different Nonce for each message. To ensure that, I wrote this precondition: Pre => Never_Used_Yet (N) Never_Used_Yet is a ghost function, it means a function that doesn't affect the program's behavior. type Box_Nonce is limited private; function Never_Used_Yet (N: Box_Nonce) return Boolean with Ghost; When GNATprove is used, it sees that procedure Randombytes (Box_Nonce: out N), the procedure that randomly generates the Nonce, has the postcondition Never_Used_Yet (N). So it deduces that when the Nonce is first generated, Never_Used_Yet (N) is true. Thus the first time N is used by Crypto_Box_Open, the precondition is valid. But if N is used a second time, GNATprove cannot prove Never_Used_Yet (N) is still true (because N as the parameter in out, so it could have been changed). That's why it cannot prove a program that calls Crypto_Box_Open twice with the same Nonce: There are ways around this condition: for instance one could copy a random generated Nonce many times to use it on different message. To avoid that, Box_Nonce is declared as limited private: it cannot be copied and GNATprove cannot prove a copied Nonce has the same Never_Used_Yet property as a generated one. Box_Nonce and Never_Used_Yet are declared in the private part of the package with SPARK_Mode Off so that GNATprove treats them as opaque entities: private pragma SPARK_Mode (Off); type Box_Nonce is new Block8 (1 .. Crypto_Box_NONCEBYTES); function Never_Used_Yet (N : Box_Nonce) return Boolean is (True);  Never_Used_Yet always returns true. It is a fake implementation, what matters is that it is hidden for the proof. It works at runtime because it is always used as a positive condition. As a program requirement it is always used as "Never_Used_Yet (N)" and never "not Never_Used_Yet (N)" so the conditions are always valid, and Never_Used_Yet doesn't affect the program's behavior, even if contracts are executed at runtime. Another example of preconditions made with a ghost function is the function Is_Signed (M : Plain_Text). When you want to send an encrypted message to someone, you want this person to be able to check if this message is from you, so no one will be able to steal your identity. To do this, you have to sign your message with Crypto_Sign_Easy, before encrypting it with Crypto_Box_Easy. Trying to skip the signing step leads to a proof error: ## How to use Libsodium_Binding The repository contains: • The project file libsodium.gpr • The library directory lib • The common directory which contains: ◦ The libsodium_binding package, a low-level binding in Ada made from the files generated using the Ada spec dump compiler. ◦ The libsodium_interface package, a higher level binding in SPARK which uses libsodium_binding. • include, a directory which contains the headers of libsodium. • libsodium_body, a directory which contains the bodies. • outside_src, a directory which contains the headers that were removed from include, to fix a problem of double definition. • thin_binding, a directory which contains the binding generated using the Ada spec dump compiler. • The test directory which contains tests for each group of functions. • A testsuite which verifies the same tests as the ones in the test directory. • The examples directory, with examples that use different groups of programs together. It also contains a program where a Nonce is used twice, and as expected it fails at proof stage. outside_src and thin_binding are not used for the binding, but I let them in the repository because it shows what I changed from the original libsodium sources and the Ada generated binding. This project is a library project so directory lib is the only thing necessary. ## How to use TweetNaCl_Binding The repository contains: • The project file tweetnacl.gpr • The common directory which contains: ◦ The tweetnacl_binding package, a low-level binding in Ada made from the files generated using the Ada spec dump compiler. ◦ The tweetnacl_interface package, a higher level binding in SPARK which uses tweetnacl_binding. ◦ tweetnacl.h and tweetnacl.c, the header and the body of the library ◦ randombytes.c, which holds randombytes, a program to generate arrays. • The test directory: test1 and test1b are functional examples of how to use tweetnacl main programs, the others are examples of what happens if you give an array with the wrong size, if you try to use the same nonce twice etc. They fail either at execution or at proof stage. To use this binding, you just have to include the common directory in the Sources of your project file. ]]> Proving a simple program doing I/O ... with SPARK https://blog.adacore.com/proving-a-simple-program-doing-io Tue, 09 Jul 2019 12:37:05 +0000 Joffrey Huguet https://blog.adacore.com/proving-a-simple-program-doing-io The functionality of many security-critical programs is directly related to Input/Output (I/O). This includes command-line utilities such as gzip, which might process untrusted data downloaded from the internet, but also any servers that are directly connected to the internet, such as webservers, DNS servers and so on. However, I/O has received little attention from the formal methods community, and SPARK also does not help the programmer much when it comes to I/O. SPARK has been used to debug functions from the standard library Ada.Text_IO, but this approach lacked support for error handling and didn't allow going up to the application level. As an example, take a look at the current specification of Ada.Text_IO.Put, which only recently has been annotated with some SPARK contracts: procedure Put (File : File_Type; Item : String) with Pre => Is_Open (File) and then Mode (File) /= In_File, Global => (In_Out => File_System); (We have suppressed the postcondition of this function, which talks about line and page length, a functionality of Ada.Text_IO which is not relevant to this blog post.) We can see that Put has a very light contract that protects against some errors, such as calling it on a file that has not been opened for writing, but not against the other many possible errors related to writing, such as a full disk or file permissions. If such an error occurs, Put raises an exception, but SPARK does not allow catching exceptions. So in the above form, even a proved SPARK program that uses Put may terminate with an unhandled exception. Moreover, Put does not specify what exactly is written. For example, one cannot prove in SPARK that two calls to Put with the same File argument write the concatenation of the two Item arguments to the file. This second problem can probably be solved using a stronger contract (though it would be difficult to do so in Ada.Text_IO, whose interface must respect the Ada standard), but together with the first point it means that, even if our program is annotated with a suitable contract and proved, we can only say something informal like “if the program doesn’t crash because of unexpected errors, it respects its contract”. In this blogpost, we propose to solve these issues as follows. We replace Put by a procedure Write which reports errors via its output Has_Written: procedure Write (Fd : int; Buf : Init_String; Num_Bytes : Size_T; Has_Written : out ssize_t); We now can annotate Write with a suitable postcondition that explains exactly what has been written to the file descriptor, and that a negative value of Has_Written signals an error. It is no accident that the new procedure looks like the POSIX system call write. In fact we decided to base this experiment on the POSIX API that consists of open, close, read and write, and simply added very thin SPARK wrappers around those system calls. The advantage of this approach is that system calls never crash (unless there are bugs in the kernel, of course) - they simply flag errors via their return value, or in our case, the out parameter Has_Written. This means that, assuming our program contains a Boolean variable Error that is updated accordingly after invoking the system calls, we can now formally write down the contract of our program in the form: if not Error then ... And if one thinks about it, this is really the best any program dealing with I/O can hope for, because some errors cannot be predicted or avoided by the programmer, such as trying to write to a full disk or trying to open a file that does not exist. We could further refine the above condition to distinguish the behavior of the program depending on the error that was encountered, but we did not do this in the work described here. The other advantage of using a POSIX-like API is that porting existing programs from C to this API would be simpler. We validated this approach by writing a SPARK clone of the “cat” utility. We were able to prove that cat copies the content of the files in its argument to stdout, unless errors were encountered. The project is available following this link. # How to represent the file system? The main interest of the library is to be able to write properties about content, which means that we have to represent this content through something (global variables, state abstraction…). We decided to use maps, that link a file descriptor to an unbounded string representing the content of the corresponding file. Formal_Hashed_Maps were convenient to use because they come with many functions that compare two different maps (e.g Keys_Included_Except, or Elements_Equal_Except). In Ada, even Unbounded_String (from Ada.Strings.Unbounded) are somehow bounded: the maximum length of Unbounded_String is Natural’Last, a length that could be exceeded by certain files. We had to create another library, that relies on Ada.Strings.Unbounded, but would accept appending two unbounded strings whose lengths are maximal. The choice we made was to drop any character that would make the length overflow, e.g. the append function has the following contract: function "&" (L, R : My_Unbounded_String) return My_Unbounded_String with Global => null, Contract_Cases => (Length (L) = Natural'Last => To_String ("&"'Result) = To_String (L), -- When L already has maximal length Length (L) < Natural'Last and then Length (R) <= Natural'Last - Length (L) => To_String ("&"'Result) = To_String (L) & To_String (R), -- When R can be appended entirely others => To_String ("&"'Result) = To_String (L) & To_String (R) (1 .. Natural'Last - Length (L))); -- When R cannot be appended entirely Also, what you can see here is that all properties of unbounded strings are expressed with the conversion to String type: this allows to use the existing axiomatization of strings (through arrays) instead of redefining one. Another design choice was made here: what’s the content of a file we opened in read mode? In stdio.ads, we considered that the content of this file is strictly what we read from it. Because there is no way to know the content of the file, we decided to implement cat as a procedure that would “write whatever it reads”. Any given file could be modified while reading, or, for the case of stdin, parent process may also read part of the data. # The I/O library As said before, the library consists of thin bindings to system calls. Interfacing scalar types (int, size_t, ssize_t) was easy. We also wanted to model more precisely instantiation of buffers. Indeed, when calling Read procedure, the buffer might not be initialized entirely after the call; it is also possible to call Write on a partially initialized String, to copy the first n values that are initialized. The current version of SPARK requires that arrays (and thus Strings) are fully initialized. More details are available in the SPARK User's Guide. A new proposed evolution of the language (see here) is already prototyped in SPARK and allows manipulating safely partially initialized data. In the prototype, we declare a new type, Init_Char, that will have the Init_By_Proof annotation. subtype Init_Char is Character; pragma Annotate(GNATprove, Init_By_Proof, Init_Char); type Init_String is array (Positive range <>) of Init_Char;  With this type declaration, it is now possible to use the attribute ‘Valid_Scalars on Init_Char variables or slices of Init_String variables in Ghost code to specify which scalars have been initialized The next part was writing the bindings to C functions. An example, for Read, is the following: function C_Read (Fd : int; Buf : System.Address; Size : size_t; Offset : off_t) return ssize_t; pragma Import (C, C_Read, "read"); procedure Read (Fd : int; Buf : out Init_String; Has_Read : out ssize_t) is begin Has_Read := C_Read (Fd, Buf'Address, Buf'Length, 0); end Read;  Other than hiding the use of addresses from SPARK, this part was not very difficult. The final part was adding the contracts to our procedures. Firstly, there are no preconditions. System calls may return errors, but they will accept any parameter in input. The only precondition we have in the library is a precondition on Write, that states that the characters that we want to write in the file are initialized. Secondly, every postcondition is a case expression, where we give properties for each possible return value, e.g: procedure Open (File : char_array; Flags : int; Fd : out int) with Global => (In_Out => (FD_Table, Errors.Error_State, Contents)), Post => (case Fd is when -1 => Contents'Old = Contents, when 0 .. int (OPEN_MAX - 1) => Length (Contents'Old) + 1 = Length (Contents) and then Contains (Contents, Fd) and then Length (Element (Contents, Fd)) = 0 and then not Contains (Contents'Old, Fd) and then Model (Contents'Old) <= Model (Contents) and then M.Keys_Included_Except (Model (Contents), Model (Contents'Old), Fd), when others => False); The return value of Open will be either -1, which corresponds to an error, or a natural value (in my case, OPEN_MAX is equal to 1023, 1024 being the maximum number of files that can be open at the same time on my machine). If an error occured, the Contents map is the same as before. If an appropriate file descriptor is returned, the postcondition states that the new Contents map has the same elements as before, plus a new empty unbounded string associated with the file descriptor. With regard to the functional model, these contracts are complete. # The Cat program The cat program is split in two different parts: the main program that opens/closes file(s) in its argument, and calls Copy_To_Stdout. This procedure will read from the input file and write to stdout what it read. Since errors in I/O can happen, we propagate them using a status flag from nested subprograms to the main program and handle them there. This point is the first difference with Ada.Text_IO. The other difference is the presence of postconditions about data, for example in postcondition of Copy_To_Stdout: procedure Copy_To_Stdout (Input : int; Err : out Int) with Post => (if Err = 0 then Element (Contents, Stdout) = Element (Contents’Old, Stdout) & Element (Contents, Input));  This postcondition is the only functional contract we have for cat, this is why it is so important. It states that if no error occurred, the content of stdout is equal to its value before calling Copy_To_Stdout appended to the content we read from the input. If we wanted to write a more precise contract, the main difficulty would be to handle the cases where an error occured. For example, if we call cat on three different files, and one error occurs when copying the second file to stdout, we have no contract about the content of Stdout, and everything becomes unprovable. Adding contracts for these cases would require work on slices and sliding and to do even more case-splitting, which would add more difficulty for the provers. Type definitions and helper subprograms to define the library take about 200 lines of code. The library itself has around 100 lines of contracts. The cat program has 100 lines of implementation and 1200 lines of Ghost code in order to prove everything. Around 1000 verification conditions for the entire project (I/O library + cat + lemmas) are discharged by the solvers to prove everything with auto-active proof. # Conclusion Cat looks like a simple program, and it is. But being able to prove correctness of cat shows that we are able to reason about contents and copying of data (here, between file descriptors), something which is necessary and largely identical for more complex applications like network drivers or servers that listen on sockets. We will be looking at extending our approach to code manipulating (network) sockets. ]]> Using Ada for a Spanish Satellite Project https://blog.adacore.com/using-ada-for-a-spanish-satellite-project Tue, 18 Jun 2019 13:52:00 +0000 Juan Zamorano https://blog.adacore.com/using-ada-for-a-spanish-satellite-project I am an Associate Professor at Polytechnic University of Madrid’s (Universidad Politécnica de Madrid / UPM) in the Department of Architecture and Technology of Computer Systems. For the past several years I have been directing a team of colleagues and students in the development of a UPMSat-2 microsatellite. The project originally started in 2013 as a follow-to the UPM-SAT 1, launched by an Ariane-4 in 1995. The UPMSat-2 weighs 50kg, and its geometric envelope is a parallelepiped with a base measuring 0.5m x 0.5m and a height of 0.6m. The microsatellite is scheduled to be launched September 9, 2019 on a Vega launcher, and is expected to be operational for two years. The primary goals of the project were: • to improve the knowledge of the project participants, both professors and students; • to demonstrate UPM’s capabilities in space technology; • to design, develop, integrate, test, launch and operate a microsatellite in orbit from within a university environment; and • to develop a qualified space platform that can be used for general purpose missions aimed at educational, scientific and technological demonstration applications. The project encompasses development of the software together with the platform, thermal control, attitude control, and other elements. In 2014, we selected AdaCore’s GNAT cross-development environment for the UPMSat-2 microsatellite project’s real-time on-board and ground control software. While Java is the primary language used to teach programming at UPM, Ada was chosen as the main programming language for our project because we considered it the most appropriate to develop high-integrity software. In total, the on-board software consists of more than 100 Ada packages, comprising over 38K lines of code. For the altitude control subsystem, we used C code that was generated automatically from Simulink® models (there are 10 source files in C, with a total of about 1,600 lines of code). We also used database and XML interfaces to support the development of the ground control software. Ada is Easy to Learn Since most of our students are last-year or graduate students, they generally have programming experience. However, they did not have experience with embedded systems or real-time programming, and none of them had any previous experience with GNAT, SPARK or Ada. To teach Ada to our students, we provided them with John Barnes’ Programming in Ada 2012 textbook and spent a fair amount of time with it in the laboratory. The difficult part was not in understanding and using Ada, but rather in understanding the software issues and programming style associated with concurrency, exceptions, real-time scheduling, and large system design. Fortunately, Ada’s high-level concurrency model, simple exception facility, real-time support, and its many features for “programming in the large” (packages, data abstraction, child libraries, generics, etc.) helped to address these difficulties. We also used a GNAT feature that saved us a lot of tedious coding time - the Scalar_Storage_Order attribute. The UPMSat-2’s on-board computer is big-endian and the ground computer is little-endian. Therefore, we had to decode and encode every telemetry and telecommand message to deal with the endianness mismatch. I learned about the Scalar_Storage_Order feature at an AdaCore Tech Day, and it works really well, even for 12-bit packet types. Although it would have been nice to have some additional tool support for things like database and XML interfacing, we found the GNAT environment very intuitive and especially appreciated the GPS IDE; it’s a great tool for developing software. Why Ada? I have been in love with Ada for a long time. I learned programming with Pascal and concurrent Pascal (as well as Fortran and COBOL); I find it frustrating that, at many academic institutions, modular, strongly-typed, concurrent languages such as Ada have sometimes been replaced by others that have much weaker support for good programming practices. While I do not teach programming at UPM, my research group tries to use Ada whenever possible, because we consider it the most appropriate programming language for illustrating the concepts of real-time and embedded systems. I have to say that most of my students have also fallen in love with Ada. Our graduate students in-particular appreciate the value and reliability that Ada brings to their final projects. https://www.adacore.com/press/spanish-satellite-project ]]> RFCs for Ada and SPARK evolution now on GitHub https://blog.adacore.com/rfcs-for-ada-and-spark-evolution-now-on-github Tue, 11 Jun 2019 12:46:54 +0000 Yannick Moy https://blog.adacore.com/rfcs-for-ada-and-spark-evolution-now-on-github Ever wished that Ada was more this and less that? Or that SPARK had such-and-such feature for specifying your programs? Then you're not alone. The Ada-Comment mailing list is one venue for Ada language discussions, but many of us at AdaCore have felt the need for a more open discussion and prototyping of what goes in the Ada and SPARK languages. That's the main reason why we've set up a platform to collect, discuss and process language evolution proposals for Ada and SPARK. The platform is hosted on GitHub, and uses GitHub built-in mechanisms to allow people to propose fixes or evolutions for Ada & SPARK,or give feedback on proposed evolutions. For SPARK, the collaboration between Altran and AdaCore allowed us to completely redesign the language as a large subset of Ada, including now object orientation (added in 2015), concurrency (added in 2016) and even pointers (now available!), but we're reaching the point where catching up with Ada cannot lead us much further, and we need a broader involvement from users to inform our strategic decisions. Regarding Ada, the language evolution has always been a collective effort by an international committee, but here too we feel that more user involvement would be beneficial to drive future evolution, including for the upcoming Ada 202X version. Note that there is no guarantee that changes discussed and eventually prototyped & implemented will ever make it into the Ada standard, even though AdaCore will do its best to collaborate with the Ada Rapporteur Group (ARG). You will see that we've started using the RFC process internally. That's just the beginning. We plan to use this platform much more broadly within AdaCore and the Ada community to evolve Ada and SPARK in the future. Please join us in that collective effort if you are interested! ]]> Using Pointers in SPARK https://blog.adacore.com/using-pointers-in-spark Thu, 06 Jun 2019 12:22:00 +0000 Claire Dross https://blog.adacore.com/using-pointers-in-spark I joined the SPARK team during the big revamp leading to the SPARK 2014 version of the proof technology. Our aim was to include in SPARK all features of Ada 2012 that did not specifically cause problems for the formal verification process. Since that time, I have always noticed the same introduction to the SPARK 2014 language: a subset of Ada, excluding features not easily amenable to formal analysis. Following this sentence was a list of (more notable) excluded features. Over the years, this list of features has started to shrink, as (restricted) support for object oriented programming, tasking and others were added to the language. Up until now, the most notable feature that was still missing, to my sense, was pointer support (or support for access types as they are called in Ada). I always thought that this was a feature we were never going to include. Indeed, absence of aliasing is a key assumption of the SPARK analysis, and removing it would induce so much additional annotation burden for users, that it would make the tool hardly usable. This was what I believed, and this was what we kept explaining to users, who were regretting the absence of an often used feature of the language. I think it was work on the ParaSail language, and the emergence of the Rust language, that first made us look again into supporting this feature. Pointer ownership models did not appear explicitly within Rust, but Rust made the restrictions associated with them look tractable, and maybe even desirable from a safety point of view. So what is pointer ownership? Basically, the idea is that an object designated by a pointer always has a single owner, which retains the right to either modify it, or (exclusive or) share it with others in a read-only way. Said otherwise, we always have either several copies of the pointer which allow only reading, or only a single copy of the pointer that allows modification. So we have pointers, but in a way that allows us to ignore potential aliases… What a perfect fit for SPARK! So we began to look into how ownership rules were enforced in Rust and ParaSail, and how we could adapt some of them for Ada without introducing too many special cases and new annotations. In this post, I will show you what we came up with. Don’t hesitate to comment and tell us what you like / don’t like about this feature. The main idea used to enforce single ownership for pointers is the move semantics of assignments. When a pointer is copied through an assignment statement, the ownership of the pointer is transferred to the left hand side of the assignment. As a result, the right hand side loses the ownership of the object, and therefore loses the right to access it, both for writing and reading. On the example below, the assignment from X to Y causes X to lose ownership on the value it references. As a result, the last assertion, which reads the value of X, is illegal in SPARK, leading to an error message from GNATprove: procedure Test is type Int_Ptr is access Integer; X : Int_Ptr := new Integer'(10); Y : Int_Ptr; -- Y is null by default begin Y := X; -- ownership of X is transferred to Y pragma Assert (Y.all = 10); -- Y can be accessed Y.all := 11; -- both for reading and writing pragma Assert (X.all = 11); -- but X cannot, or we would have an alias end Test; test.adb:9:20: insufficient permission on dereference from "X" test.adb:9:20: object was moved at line 6 In this example, we can see the point of these ownership rules. To correctly reason about the semantics of a program, SPARK needs to know, when a change is made, what are the objects that are potentially impacted. Because it assumes that there can be no aliasing (at least no aliasing of mutable data), the tool can easily determine what are the parts of the environment that are updated by a statement, be it a simple assignment, or for example a procedure call. If we were to break this assumption, we would need to either assume the worst (that all references can be aliases of each other) or require the user to explicitly annotate subprograms to describe which references can be aliased and which cannot. In our example, SPARK can deduce that an assignment to Y cannot impact X. This is only correct because of ownership rules that prevent us from accessing the value of X after the update of Y. Note that a variable which has been moved is not necessarily lost for the rest of the program. Indeed, it is possible to assign it again, restoring ownership. For example, here is a piece of code that swaps the pointers X and Y: declare Tmp : Int_Ptr := X; -- ownership of X is moved to Tmp -- X cannot be accessed. begin X := Y; -- ownership of Y is moved to X -- Y cannot be accessed -- X is unrestricted. Y := Tmp; -- ownership of Tmp is moved to Y -- Tmp cannot be accessed -- Y is unrestricted. end; This code is accepted by the SPARK tool. Intuitively, we can see that writing at top-level into X after it has been moved is OK, since it will not modify the actual owner of the moved value (here Tmp). However, writing in X.all is forbidden, as it would affect Tmp (don’t hesitate to look at the SPARK Reference Manual if you are interested in the formal rules of the move semantics). For example, the following variant is rejected: declare Tmp : Int_Ptr := X; -- ownership of X is moved to Tmp -- X cannot be accessed. begin X.all := Y.all; insufficient permission on dereference from "X" object was moved at line 2 Moving is not the only way to transfer ownership. It is also possible to borrow the ownership of (a part of) an object for a period of time. When the borrower disappears, the borrowed object regains the ownership, and is accessible again. It is what happens for example for mutable parameters of a subprogram when the subprogram is called. The ownership of the actual parameter is transferred to the formal parameter for the duration of the call, and should be returned when the subprogram terminates. In particular, this disallows procedures that move some of their parameters away, as in the following example: type Int_Ptr_Holder is record Content : Int_Ptr; end record; procedure Move (X : in out Int_Ptr_Holder; Y : in out Int_Ptr_Holder) is begin X := Y; -- ownership of Y.Content is moved to X.Content end Move; insufficient permission for "Y" when returning from "Move" object was moved at line 3 Note that I used a record type for the type of the parameters. Indeed, SPARK RM has a special wording for in out parameters of an access type, stating that they are not borrowed but moved on entry and on exit of the subprogram. This allows us to move in out access parameters, which otherwise would be forbidden, as borrowed top-level access objects cannot be moved. The SPARK RM also allows declaring local borrowers in a nested scope by using an anonymous access type: declare Y : access Integer := X; -- Y borrows the ownership of X -- for the duration of the declare block begin pragma Assert (Y.all = 10); -- Y can be accessed Y.all := 11; -- both for reading and writing end; pragma Assert (X.all = 11); -- The ownership of X is restored, -- it can be accessed again  But this is not supported yet by the proof tool, as it raises the complex issue of tracking modifications of X that were done through Y during its lifetime: local borrower of an access object is not yet supported It is also possible to share a single reference between several readers. This mechanism is called observing. When a variable is observed, both the observed object and the observer retain the right to read the object, but none can modify it. As for borrowing, when the observer disappears, the observed object regains the permissions it had before (read-write or read-only). Here is an example. We have a list L, defined as a recursive pointer-based data structure in the usual way. We then observe its tail by introducing a local observer N using an anonymous access to constant type. We then do it again to observe the tail of N: declare N : access constant List := L.Next; -- observe part of L begin declare M : access constant List := N.Next; -- observe again part of N begin pragma Assert (M.Val = 3); -- M can be read pragma Assert (N.Val = 2); -- but we can still read N pragma Assert (L.Val = 1); -- and even L end; end; L.Next := null; -- all observers are out of scope, we can modify L We can see that the three variables retain the right to read their content. But it is OK as none of them is allowed to update it. When no more observers exist, it is again possible to modify L. In addition to single ownership, SPARK restricts the use of access types in several ways. The most notable one is that SPARK does not allow general access types. The reason is that we did not want to deal with accesses to variables defined on the stack and accessibility levels. Also, access types cannot be stored in subcomponents of tagged types, to avoid having access types hidden in record extensions. To get convinced that the rules enforced by SPARK still allow common use cases, I think the best is to look at an example. A common use case for pointers in Ada is to store indefinite types inside data-structures. Indefinite types are types whose subtype is not known statically. It is the case for example for unconstrained arrays. Since the size of an indefinite type is not known statically, it is not possible to store it inside a data-structure, such as another array, or a record. For example, as strings are arrays, it is not possible to create an array that can hold strings of arbitrary length in Ada. The usual work-around consists in adding an indirection via the use of pointers, storing pointers to indefinite elements inside the data structure. Here is an example of how this can now be done in SPARK, for a minimal implementation of a dictionary. A simple vision of a dictionary is an array of strings. Since strings are indefinite, I need to define an access type to be allowed to store them inside an array: type Word is not null access String; type Dictionary is array (Positive range <>) of Word; We can then search for a word in a dictionary. The function below is successfully verified in SPARK. In particular, SPARK is able to verify that no null pointer dereference may happen, due to Word being an access type with null exclusion: function Search (S : String; D : Dictionary) return Natural with Post => (Search'Result = 0 and then (for all I in D'Range => D (I).all /= S)) or else (Search'Result in D'Range and then D (Search'Result).all = S) is begin for I in D'Range loop pragma Loop_Invariant (for all K in D'First .. I - 1 => D (K).all /= S); if D (I).all = S then return I; end if; end loop; return 0; end Search; Now imagine that I want to modify one of the words stored in my dictionary. The words may not have the same length, so I need to replace the pointer in the array. For example: My_Dictionary (1) := new String'("foo"); pragma Assert (My_Dictionary (1).all = "foo"); But this is not great, as now I have a memory leak. Indeed, the value previously stored in My_Dictionary is no longer accessible and it has not been deallocated. The SPARK tool does not currently complain about this problem, even though the SPARK definition says it should (it has not been implemented yet). But let’s try to correct our code nevertheless by storing the value previously in dictionary in a temporary and deallocating it afterward. First I need a deallocation function. In Ada, they can be obtained by instantiating the generic procedure Ada.Unchecked_Deallocation with the appropriate types (note that as access objects are set to null after deallocation, I had to introduce a base type for Word without the null exclusion constraint): type Word_Base is access String; subtype Word is not null Word_Base; procedure Free is new Ada.Unchecked_Deallocation (String, Word_Base); Then, I can try to do the replacement: declare Temp : Word_Base := My_Dictionary (1); begin My_Dictionary (1) := new String'("foo"); Free (Temp); pragma Assert (My_Dictionary (1).all = "foo"); end; Unfortunately this does not work, the SPARK tool complains with: test.adb:36:37: insufficient permission on dereference from "My_Dictionary" test.adb:36:37: object was moved at line 3 Where line 31 is the line where Temp is defined and 36 is the assertion. So, what is happening? In fact, this is due to the way checking of single ownership is done in SPARK. As the analysis used for this verification is not value dependent, when a cell of an array is moved, the tool is never able to determine whether or not ownership to an array cell has been regained. As a result, if an element of an array is moved away, the array will never become readable again unless it is assigned as a whole. Better to avoid moving elements of an array in these conditions, right? So what can we do? If we cannot move, what about borrowing... Let us try with an auxiliary Swap procedure: procedure Swap (X, Y : in out Word_Base) with Pre => X /= null and Y /= null, Post => X /= null and Y /= null and X.all = Y.all'Old and Y.all = X.all'Old is Temp : Word_Base := X; begin X := Y; Y := Temp; end Swap; declare Temp : Word_Base := new String'("foo"); begin Swap (My_Dictionary (1), Temp); Free (Temp); pragma Assert (My_Dictionary (1).all = "foo"); end; Now everything is fine. The ownership on My_Dictionary (1) is temporarily transferred to the X formal parameter of Swap for the duration of the call, and it is restored at the end. Now the SPARK tool can ensure that My_Dictionary indeed has the full ownership of its content after the call, and the read inside the assertion succeeds. This small example is also verified by the SPARK tool. I hope this post gave you a taste of what it would be like to program using pointers in SPARK. If you now feel like using them, a preview is available in the community 2019 edition of GNAT+SPARK. Don’t hesitate to come back to us with your findings, either on GitHub or by email. ]]> GNAT Community 2019 is here! https://blog.adacore.com/gnat-community-2019-is-here Wed, 05 Jun 2019 12:56:00 +0000 Nicolas Setton https://blog.adacore.com/gnat-community-2019-is-here We are pleased to announce that GNAT Community 2019 has been released! See https://www.adacore.com/download. This release is supported on the same platforms as last year: • Windows, Linux, and Mac 64-bit native • RISC-V hosted on Linux • ARM 32 bits hosted on 64-bit Linux, Mac, and Windows GNAT Community now includes a number of fixes and enhancements, most notably: • The installer for Windows and Linux now contains pre-built binary distributions of Libadalang, a very powerful language tooling library for Ada and SPARK. Check out the README for some additional platform-specific notes. We hope you enjoy using SPARK and Ada! ]]> Bringing Ada To MultiZone https://blog.adacore.com/bringing-ada-to-multizone Wed, 29 May 2019 21:30:00 +0000 Boran Car https://blog.adacore.com/bringing-ada-to-multizone # Introduction C is the dominant language of the embedded world, almost to the point of exclusivity. Due to its age, and its goal of being a “portable assembler”, it deliberately lacks type-safety that languages like Ada provide. The lack of type-safety in C is one of the reasons for the commonness of embedded device exploits. Proposed solutions are partitioning the application into smaller intercommunicating blocks, designed with the principle of least privilege in mind; and rewriting the application in a type-safe language. We believe that both approaches are complementary and want to show you how to combine separation and isolation provided by MultiZone together with iteratively rewriting parts in Ada. We will take the MultiZone SDK demo and rewrite one of the zones in Ada. The full demo simulates an industrial application with a robotic arm. It runs on the Arty A7-35T board and interfaces with the PC and a robotic arm (OWI-535 Robotic Arm) via an SPI to USB converter. More details are available from the MultiZone Security SDK for Ada manual (https://github.com/hex-five/multizone-ada/blob/master/manual.pdf). We will just be focusing on the porting process here. ## MultiZone Security MultiZone(TM) Security is the first Trusted Execution Environment for RISC-V - it enables development of a simple, policy-based security environment for RISC-V that supports rich operating systems through to bare metal code. It is a culmination of the embedded security best practices developed over the last decade and now applied to RISC-V processors. Instead of splitting into the secure and non-secure domain, MultiZoneTM Security provides policy-based hardware-enforced separation for an unlimited number of security domains, with full control over data, code and peripherals. MultiZoneTM Security consists of the following components: • MultiZone(TM) nanoKernel - lightweight, formally verifiable, bare metal kernel providing policy-driven hardware-enforced separation of ram, rom, i/o and interrupts. • InterZone(TM) Messenger - communications infrastructure to exchange secure messages across zones on a no- shared memory basis. • MultiZone(TM) Configurator - combines fully linked zone executables with policies and kernel to generate the signed firmware image. • MultiZone(TM) Signed Boot - 2-stage signed boot loader to verify integrity and authenticity of the firmware image (sha-256 / ECC) Contrary to traditional solutions, MultiZone(TM) Security requires no additional hardware, dedicated cores or clunky programming models. Open source libraries, third party binaries and legacy code can be configured in minutes to achieve unprecedented levels of safety and security. See https://hex-five.com/ for more details or check out the MultiZone SDK repository on GitHub - https://github.com/hex-five/multizone-sdk. # Ada on MultiZone We port zone 3, the zone controlling the robotic arm, to Ada. The zone communicates with other zones via MultiZone APIs and with the robotic arm by bitbanging GPIO pins. ## New Runtime MultiZone zones differ from a bare metal applications as access to resources is restricted – a zone has only a portion of the RAM and FLASH and can only access some of the peripherals. Looking at our configuration (https://github.com/hex-five/multizone-ada/blob/master/bsp/X300/multizone.cfg), zone3 has the following access privileges: Zone = 3 # base = 0x20430000; size = 64K; rwx = rx # FLASH base = 0x80003000; size = 4K; rwx = rw # RAM base = 0x0200BFF8; size = 0x8; rwx = r # RTC base = 0x10012000; size = 0x100; rwx = rw # GPIO In the Ada world, this translates to having a separate runtime that we need to create. Luckily, AdaCore has released sources to their existing runtimes on GitHub - https://github.com/adacore/bb-runtimes and they have also included a how-to for creating new runtimes - https://github.com/AdaCore/bb-runtimes/tree/community-2018/doc/porting_runtime_for_cortex_m. Here’s how we create our customized HiFive1 runtime: ./build_rts.py --bsps-only --output=build --prefix=lib/gnat hifive1 This creates sources for building a runtime using a mix of sources from bb-runtimes and from the compiler itself thanks to the --bsps-only flag. Without this switch, we would need the original GNAT repository, which is not publicly available. Notice we don’t use --link, so our runtime sources are a proper copy and can be checked into a new git repo - https://github.com/hex-five/multizone-ada/tree/master/bsp/X300/runtime. Our runtime needs to be compiled and installed before it can be used, and we do that automatically as part of the Makefile for zone3 - https://github.com/hex-five/multizone-ada/blob/master/zone3/Makefile: BSP_BASE := ../bsp PLATFORM_DIR :=$(BSP_BASE)/$(BOARD) RUNTIME_DIR :=$(PLATFORM_DIR)/runtime
GPRBUILD := $(abspath$(GNAT))/bin/gprbuild
GPRINSTALL := $(abspath$(GNAT))/bin/gprinstall

.PHONY: all
all:
$(GPRBUILD) -p -P$(RUNTIME_DIR)/zfp_hifive1.gpr
$(GPRINSTALL) -f -p -P$(RUNTIME_DIR)/zfp_hifive1.gpr --prefix=$(RUNTIME_DIR)$(AR) cr $(RUNTIME_DIR)/lib/gnat/zfp-hifive1/adalib/libgnat.a$(RUNTIME_DIR)/hifive1/zfp/obj/*.o
$(GPRBUILD) -f -p -P zone3.gpr$(OBJCOPY) -O ihex obj/main zone3.hex --gap-fill 0x00

Note that we use a combination of GPRbuild and Make to minimize code differences between the two repositories as much as possible. GPRbuild is limited to zone3 only.

### Board support

We use the Ada Drivers Library on GitHub (https://github.com/AdaCore/Ada_Drivers_Library) as a starting point as it provides examples for a variety of boards and architectures. Our target is the X300, itself a modified HiFive1/FE310/FE300. The differences between FE310/FE300 and X300 are detailed on the multizone-fpga GitHub repository (https://github.com/hex-five/multizone-fpga).

We need to change the drivers for our application slightly - we want LD0 for indicating the status of the robotic arm, red blink when disconnected and green when connected:

with FE310.Device; use FE310.Device;
with SiFive.GPIO; use SiFive.GPIO;

package Board.LEDs is

subtype User_LED is GPIO_Point;

Red_LED   : User_LED renames P01;
Green_LED : User_LED renames P02;
Blue_LED  : User_LED renames P03;

procedure Initialize;
-- MUST be called prior to any use of the LEDs

procedure Turn_On (This : in out User_LED) renames SiFive.GPIO.Set;
procedure Turn_Off (This : in out User_LED) renames SiFive.GPIO.Clear;
procedure Toggle (This : in out User_LED) renames SiFive.GPIO.Toggle;

procedure All_LEDs_Off with Inline;
procedure All_LEDs_On with Inline;
end Board.LEDs;

Having the package named Board allows us to modify the target board at compile-time by just providing the folder containing the Board package. This goes in line with what multizone-sdk does.

### Code

#### MultiZone support

MultiZone nanoKernel offers trap and emulate so existing applications can be provided to MultiZone directly unmodified and will work as expected. Access to RAM, FLASH and peripherals needs to be allowed in the configuration file, though. MultiZone does provide functionality for increased performance, better power usage and interzone communication via the API in Libhexfive. Here we create an Ada wrapper around libhexfive32.a, providing the MultiZone specific calls:

package MultiZone is

procedure ECALL_YIELD; -- libhexfive.h:8
pragma Import (C, ECALL_YIELD, "ECALL_YIELD");

procedure ECALL_WFI; -- libhexfive.h:9
pragma Import (C, ECALL_WFI, "ECALL_WFI");
...
end MultiZone;

A typical MultiZone optimized application will yield whenever it doesn’t have anything to do, to save on processing time and power:

with MultiZone; use MultiZone;

procedure Main is
begin
-- Application initialization
loop
-- Application code
ECALL_YIELD;
end loop;
end Main;

If a zone wants to communicate with other zones, such as receiving commands and sending back replies, it needs to use the ECALL_SEND/ECALL_RECV. These send/receive a chunk of 16 bytes and return a status whether the send/receive was successful. The prototypes are a bit special, as they take a void * parameter, which translates to System.Address:

function ECALL_SEND_C (arg1 : int; arg2 : System.Address) return int; -- libhexfive.h:11
pragma Import (C, ECALL_SEND_C, "ECALL_SEND");
function ECALL_RECV_C (arg1 : int; arg2 : System.Address) return int; -- libhexfive.h:12
pragma Import (C, ECALL_RECV_C, "ECALL_RECV");

We wrap these to provide a more Ada idiomatic alternative:

type Word is new Unsigned_32;
type Message is array (0 .. 3) of aliased Word;
pragma Pack (Message);
subtype Zone is int range 1 .. int'Last;

function Ecall_Send (to : Zone; msg : Message) return Boolean;
function Ecall_Recv (from : Zone; msg : out Message) return Boolean;

We hide the System.Address usage and provide a safer subtype for the source/destination zone, since zone cannot be negative or 0.

function Ecall_Send (to : Zone; msg : Message) return Boolean is
begin
return ECALL_SEND_C (to, msg'Address) = 1;
end Ecall_Send;

function Ecall_Recv (from : Zone; msg : out Message) return Boolean is
begin
return ECALL_RECV_C (from, msg'Address) = 1;
end Ecall_Recv;

With all the primitives in place, we can make a simple MultiZone optimized application that can respond to a ping from another zone:

with HAL;       use HAL;
with MultiZone; use MultiZone;

procedure Main is
begin
-- Application initialization
loop
-- Application code
declare
msg : Message;
Status : Boolean := Ecall_Recv (1, msg);
begin
if Status then
if msg(0) = Character'Pos('p') and
msg(1) = Character'Pos('i') and
msg(2) = Character'Pos('n') and
msg(3) = Character'Pos('g') then
Status := Ecall_Send (1, msg);
end if;
end if;
end;

ECALL_YIELD;
end loop;
end Main;

#### Keeping some legacy (OWI Robot)

We keep the SPI functionality and the Owi Task as C files and just create Ada bindings for them. spi_c.c implements the SPI protocol by bit-banging GPIO pins:

package Spi is

procedure spi_init; -- ./spi.h:8
pragma Import (C, spi_init, "spi_init");

function spi_rw (cmd : System.Address) return UInt32; -- ./spi.h:9
pragma Import (C, spi_rw, "spi_rw");

end Spi;

The OwiTask (owi_task.c) is a state machine containing different robot sequences. The main function is owi_task_run, while others are change the active state. owi_task_run returns the next SPI command to send via SPI for a given moment in time:

package OwiTask is

end OwiTask;

The following Ada code runs the state machine for the OWI robot. Since the target functions are written in C, we see how Ada can interact with legacy C code:

-- OWI sequence run
if usb_state = 16#12670000# then
declare
cmd_bytes : Cmd;
begin
if cmd_word /= -1 then
cmd_bytes(0) := UInt8 (cmd_word and 16#FF#);
cmd_bytes(1) := UInt8 (Shift_Right (cmd_word,  8) and 16#FF#);
cmd_bytes(2) := UInt8 (Shift_Right (cmd_word, 16) and 16#FF#);
ping_timer := CLINT.Machine_Time + PING_TIME;
end if;
end;
end if;

We now port over the owi sequence selection:

declare
Status : Boolean := Ecall_Recv (1, msg);
begin
if Status then
-- OWI sequence select
if usb_state = 16#12670000# then
case msg(0) is
when others => null;
end case;
end if;
...

We leave it as an exercise for the reader to port the OWI sequence handler code to Ada.

We want the LED color to change as a result of the robot connected or disconnected events. Each of the LED colors is a separate pin on the board:

Red_LED   : User_LED renames P01;
Green_LED : User_LED renames P02;
Blue_LED  : User_LED renames P03;

One way of handling this is to create an Access Type that can store the currently selected LED color:

procedure Main is
type User_LED_Select is access all User_LED;
...
LED : User_LED_Select := Red_LED'Access;
...
begin
...
-- Detect USB state every 1sec
if CLINT.Machine_Time > ping_timer then
ping_timer := CLINT.Machine_Time + PING_TIME;
end if;

-- Update USB state
declare
Status : int;
begin
if rx_data /= usb_state then
usb_state := rx_data;

if rx_data = UInt32'(16#12670000#) then
LED := Green_LED'Access;
else
LED := Red_LED'Access;
end if;
end if;
end;

The actual blinking is then just a matter of dereferencing the Access Type and calling the right function:

    -- LED blink
if CLINT.Machine_Time > led_timer then
if GPIO.Set (Red_LED) or GPIO.Set (Green_LED) or GPIO.Set (Blue_LED) then
All_LEDs_Off;
led_timer := CLINT.Machine_Time + LED_OFF_TIME;
else
All_LEDs_Off;
Turn_On (LED.all);
led_timer := CLINT.Machine_Time + LED_ON_TIME;
end if;
end if;

# Closing

If you would like to know more about MultiZone and Ada, please reach out to Hex-Five Security - https://hex-five.com/contact-2/. I will also be attending the RISC-V Workshop in Zurich - https://tmt.knect365.com/risc-v-workshop-zurich/ if you would like to grab a coffee and discuss MultiZone and Ada on RISC-V.

]]>

## Using SPARK to prove absence of run time errors on a hobby project.

The Danish Technical University has a yearly RoboCup where autonomous vehicles solve a number of challenges. Each solved challenge gives points and the team with most points wins. The track is available for testing two weeks before the qualification round.

## The idea behind and creation of RoadRunner for DTU RoboCup 2019.

RoadRunner is a 3D printed robot with wheel suspension, based on the BeagleBone Blue ARM-based board and the Pixy 1 camera with custom firmware enabling real-time line detection. Code is written in Ada and formally proved correct with SPARK at Silver level. SPARK Silver level proves that the code will execute without run-time errors like overflows, race-conditions and deadlocks. During development SPARK prevented numerous hard-to-debug errors to reach the robot. The time-boxed testing period of DTU RoboCup made early error detection even more valuable. In our opinion the estimated time saved on debugging more than outweighs the time we spent on SPARK. The robot may still fail, but it will not be due to run-time errors in our code...

Ada was the obvious choice. We both have years of experience with Ada. The BeagleBone Blue runs Debian Linux and we use the default GNAT FSF compiler version from Debian Testing. It is easy to cross compile from a Debian laptop and it is not too old compared to GNAT Community Edition. The BeagleBone Blue runs Debian Stable.

We decided initially not to use SPARK. According to some internet articles, SPARK had problems proving properties about floating points, and the Ada subset for tasking seemed to be too restricted. Support for floating points has improved since then, and tasking was extended to the Jorvik (extended Ravenscar) profile.

A race condition in the code changed the SPARK decision. A lot of valuable time was spent chasing it. The GNAT runtime on Debian armhf has no traceback info, so it was difficult to find what caused a Storage_Error. SPARK flow analysis detects race conditions, so that would prevent that kind of issues in the future.

It took a lot of effort before SPARK would be able to do flow analysis without errors. SPARK rejects code with exception handlers, fixed-point to floating-point conversions, the standard Ada.Containers and task entries. Hopefully, future versions of SPARK will be able to analyze the above or at least give a warning and continue analysis, even if it contains unsupported Ada features. When the code base is in SPARK, it is quite easy to add new code in SPARK.

## Experience with SPARK.

I had to start from scratch, with no previous experience with SPARK, formal methods or safety critical software. I read the online documentation to find examples on how to prove different types of properties.

SPARK is a tough code reviewer. It detects bad design decisions and forces a rewrite. Bad design is impossible to test and prove and SPARK does not miss any of it. Very valuable for small projects without any code review.

SPARK code is easier to read. Defensive code is moved from exception handlers and if statements to preconditions. The resulting code has less branching and is easier to understand and test.

SPARK code is also more difficult to read. Loop invariants and assertions to prove loops can outnumber the Ada statements in the loop.

We used saturation as a shortcut to prove absence of overflows, in locations where some values could not be proved to stay in bounds. But this is not an ideal solution: if it does saturate, then the result is not correct.

In our experience, proving absence of run time errors also detects real bugs, not only division by zero and overflows.

## Two examples.

SPARK could not prove a resulting value in range. A scaling factor had been divided with instead of multiplied with.

SPARK could not prove array index in range. X and Y on camera image calculation had been swapped.

## Stability.

If code is in SPARK, then all changes must be analyzed by SPARK. Almost all coding error gives a run time exception. That happened several times on the track during testing. But when tested with SPARK, it always found the exact line with the bug. After learning the lesson we found that code analyzed with SPARK always worked the first time.

Having a stable build environment with SPARK and automated tests, made it possible to make successful last-minute changes to the software just before the final run in the video link.

Here is the presentation video for our robot:

and the video of the final winning challenge:

]]>
Using SPARK to prove 255-bit Integer Arithmetic from Curve25519 https://blog.adacore.com/using-spark-to-prove-255-bit-integer-arithmetic-from-curve25519 Tue, 30 Apr 2019 19:00:00 +0000 Joffrey Huguet https://blog.adacore.com/using-spark-to-prove-255-bit-integer-arithmetic-from-curve25519

In 2014, Adam Langley, a well-known cryptographer from Google, wrote a post on his personal blog, in which he tried to prove functions from curve25519-donna, one of his projects, using various verification tools: SPARK, Frama-C, Isabelle... He describes this attempt as "disappointing", because he could not manage to prove "simple" things, like absence of runtime errors. I will show in this blogpost that today, it is possible to prove what he wanted to prove, and even more.

Algorithms in elliptic-curve cryptography compute numbers which are too large to fit into registers of typical CPUs. In this case, curve25519 uses integers ranging from 0 to 2255-19. They can't be represented as native 32- or 64-bit integers. In the original implementations from Daniel J. Bernstein and curve25519-donna, these integers are represented by arrays of 10 smaller, 32-bit integers. Each of these integers is called a "limb". Limbs have alternatively 26-bit and 25-bit length. This forms a little-endian representation of the bigger integer, with low-weight limbs first. In section 4 of this paper, Bernstein explains both the motivation and the details of his implementation. The formula to convert an array A of 32-bit integers into the big integer it represents is: $\sum_{i=0}^{9} A (i) \times 2^{\lceil 25.5i \rceil}$ where the half square brackets represent the ceiling function. I won't be focusing on implementation details here, but you can read about them in the paper above. To me, the most interesting part about this project is its proof.

# First steps

### Types

The first step was to define the different types used in the implementation and proof.

I'll use two index types. Index_Type, which is range 0 .. 9, is used to index the arrays that represent the 255-bit integers. Product_Index_Type is range 0 .. 18, used to index the output array of the Multiply function.

Integer_Curve25519 represents the arrays that will be used in the implementation, so 64-bit integer arrays with variable bounds in Product_Index_Type.  Product_Integer is the array type of the Multiply result. It is a 64-bit integer array with Product_Index_Type range. Integer_255 is the type of the arrays that represent 255-bit integers, so arrays of 32-bit integers with Index_Type range.

### Big integers library

Proving Add and Multiply in SPARK requires reasoning about the big integers represented by the arrays. In SPARK, we don't have mathematical integers (i.e. integers with infinite precision), only bounded integers, which are not sufficient for our use case. If I had tried to prove correctness of Add and Multiply in SPARK with bounded integers, the tool would have triggered overflow checks every time an array is converted to integer.

To overcome this problem, I wrote the specification of a minimal big integers library, that defines the type, relational operators and basic operations. It is a specification only, so it is not executable.

SPARK is based on why, which has support for mathematical integers, and there is a way to link a why3 definition to a SPARK object directly. This is called external axiomatization; you can find more details about how to do it here.

Using this feature, I could easily provide a big integers library with basic functions like "+", "*" or To_Big_Integer (X : Integer). As previously mentioned, this library is usable for proof, but not executable (subprograms are marked as Import).  To avoid issues during binding, I used a feature of SPARK named Ghost code. It takes the form of an aspect: "Ghost" indicates to the compiler that this code cannot affect the values computed by ordinary code, thus can be safely erased. It's very useful for us, since this aspect can be used to write non-executable functions which are only called in annotations.

### Conversion function

One of the most important functions used in the proof is a function that converts the array of 10 integers to the big integer it represents, so when X is an Integer_255 then +X is its corresponding big integer.

function "+" (X : Integer_Curve25519) return Big_Integer
renames To_Big_Integer;

After this, I can define To_Big_Integer recursively, with the aid of an auxiliary function, Partial_Conversion.

function Partial_Conversion
(X : Integer_Curve25519;
L : Product_Index_Type)
return Big_Integer
is
(if L = 0
then (+X(0)) * Conversion_Array (0)
else
Partial_Conversion (X, L - 1) + (+X (L)) * Conversion_Array (L))
with
Ghost,
Pre => L in X'Range;

function To_Big_Integer (X : Integer_Curve25519)
return Big_Integer
is
(Partial_Conversion (X, X'Last))
with
Ghost,
Pre => X'Length > 0;

### Specification of Add and Multiply

With these functions defined, it was much easier to write specifications of Add and Multiply. All_In_Range is used in preconditions to bound the parameters of Add and Multiply in order to avoid overflow.

function All_In_Range
(X, Y     : Integer_255;
Min, Max : Long_Long_Integer)
return     Boolean
is
(for all J in X'Range =>
X (J) in Min .. Max
and then Y (J) in Min .. Max);

function Add (X, Y : Integer_255) return Integer_255 with
Pre  => All_In_Range (X, Y, -2**30 + 1, 2**30 - 1),
Post => +Add'Result = (+X) + (+Y);

function Multiply (X, Y : Integer_255) return Product_Integer with
Pre  => All_In_Range (X, Y, -2**27 + 1, 2**27 - 1),
Post => +Multiply'Result = (+X) * (+Y);

# Proof

Proof of absence of runtime errors was relatively simple: no annotation was added to Add, and a few, very classical loop invariants for Multiply. Multiply is the function where Adam Langley stopped because he couldn't prove absence of overflow. I will focus on proof of functional correctness, which was a much more difficult step. In SPARK, Adam Langley couldn't prove it for Add because of overflow checks triggered by the tool when converting the arrays to the big integers they represent. This is the specific part where the big integers library is useful: it is possible to manipulate big integers in proof without overflow checks.

The method I used to prove both functions is the following:

1.   Create a function that allows tracking the content of the returned array.
2.   Actually track the content of the returned array through loop invariant(s) that ensure equality with the function in point 1.
3.  Prove the equivalence between the loop invariant(s) at the end of the loop and the postcondition in a ghost procedure.
4.  Call the procedure right before return; line.

I will illustrate this example with Add. The final implementation of Add is the following.

function Add (X, Y : Integer_255) return Integer_255 is
Sum : Integer_255 := (others => 0);
begin
for J in Sum'Range loop
Sum (J) := X (J) + Y (J);

pragma Loop_Invariant (for all K in 0 .. J =>
Sum (K) = X (K) + Y (K));
end loop;
return Sum;
end Add;

Points 1 and 2 are ensured by the loop invariant. No new function was created since the content of the array is simple (just an addition). At the end of the loop, the information we have is: "for all K in 0 .. 9 => Sum (K) = X (K) + Y (K)" which is an information on the whole array. Points 3 and 4 are ensured by Prove_Add. Specification of Prove_Add is:

procedure Prove_Add (X, Y, Sum : Integer_255) with
Ghost,
Pre  => (for all J in Sum'Range => Sum (J) = X (J) + Y (J)),
Post => To_Big_Integer (Sum) = To_Big_Integer (X) + To_Big_Integer (Y);

This is what we call a lemma in SPARK. Lemmas are ghost procedures, where preconditions are the hypotheses, and postconditions are the conclusions. Some lemmas are proved automatically, while others require solvers to be guided. You can do this by adding a non-null body to the procedure, and guide them differently, depending on the conclusion you want to prove.

I will talk about the body of Prove_Add in a few paragraphs.

### Multiply

Multiply was much harder than Add had been. But I had the most fun proving it. I refer to the implementation in curve25519-donna as the "inlined" one, because it is fully explicit and doesn't go through loops. This causes a problem in the first point of my method: it is not possible to track what the routine does, except by adding assertions at every line of code. To me, it's not a very interesting point of view, proof-wise. So I decided to change the implementation, to make it easier to understand. The most natural approach for multiplication, in my opinion, is to use distributivity of product over addition. The resulting implementation would be similar to TweetNaCl's implementation of curve25519 (see M function):

function Multiply (X, Y : Integer_255) return Product_Integer is
Product : Product_Integer := (others => 0);
begin
for J in Index_Type loop
for K in Index_Type loop
Product (J + K) :=
Product (J + K)
+ X (J) * Y (K) * (if J mod 2 = 1 and then K mod 2 = 1 then 2 else 1);
end loop;
end loop;
return Product;
end Multiply;

Inside the first loop, J is fixed and we iterate over all values of K, so on all the values of Y. It is repeated over the full range of J, so the entire content of X.  With this implementation, it is possible to track the content of the array in loop invariants through an auxiliary function, Partial_Product, that will track the value of a certain index in the array at each iteration. We add the following loop invariant at the end of the loop:

pragma Loop_Invariant (for all L in 0 .. K =>
Product (J + L) = Partial_Product (X, Y, J, L));

The function Partial_Product is defined recursively, because it is a sum of several factors.

function Partial_Product
(X, Y : Integer_255;
J, K : Index_Type)
return Long_Long_Integer
is
(if K = 9 or else J = 0
then (if J mod 2 = 1 and then K mod 2 = 1 then 2 else 1) * X (J) * Y (K)
else Partial_Product (X, Y, J - 1, K + 1)
+ (if J mod 2 = 1 and then K mod 2 = 1 then 2 else 1) * X (J) * Y (K));

As seen in the loop invariant above, the function returns Product (J + K), which is modified for all indexes L and M when J + K = L + M. The recursive call will be on the pair J + 1, K - 1, or J - 1, K + 1. Given how the function is designed in the implementation, the choice J - 1, K + 1 is preferable, because it follows the evolution of Product (J + K). The base case is when J = 0 or K = 9.

### Problems and techniques

Defining functions to track content was rather easy to do for both functions. Proving that the content is actually equal to the function was a bit more difficult for the Multiply function, but not as difficult as proving the equivalence between this and the postcondition. With these two functions, we challenge the provers in many ways:

• We use recursive functions in contracts, which means provers have to reason inductively and they struggle with this. That's why we need to /guide/ them in order to prove properties that involve recursive functions.
• The context size is quite big, especially for Multiply. Context represent all variables, axioms, theories that solvers are given to prove a check. The context grows with the size of the code, and sometimes it is too big. When this happens, solvers may be lost and not be able to prove the check. If the context is reduced, it will be easier for solvers, and they may be able to prove the previously unproved checks.
• Multiply contracts need proof related to nonlinear integers arithmetics theory, which provers have a lot of problems with.  This problem is specific to Multiply, but it explains why this function took some time to prove. Solvers need to be guided quite a bit in order to prove certain properties.

What follows is a collection of techniques I found interesting and might be useful when trying to prove problems, even very different from this one.

#### Axiom instantiation

When I tried to prove loop invariants in Multiply to track the value of Product (J + K), solvers were unable to use the definition of Partial_Product. Even asserting the exact same definition failed to be proven. This is mainly due to context size: solvers are not able to find the axiom in the search space, and the proof fails. The workaround I found is to create a Ghost procedure which has the definition as its postcondition, and a null body, like this:

procedure Partial_Product_Def
(X, Y : Integer_255;
J, K : Index_Type)
with
Pre  => All_In_Range (X, Y, Min_Multiply, Max_Multiply),
Post =>
(if K = Index_Type'Min (9, J + K)
then Partial_Product (X, Y, J, K)
= (if J mod 2 = 1 and then K mod 2 = 1 then 2 else 1)
* X (J) * Y (K)
else Partial_Product (X, Y, J, K)
= Partial_Product (X, Y, J - 1, K + 1)
+ X (J) * Y (K)
* (if J mod 2 = 1 and then K mod 2 = 1 then 2 else 1));
procedure Partial_Product_Def
(X, Y : Integer_255;
J, K : Index_Type)
is null;

In this case, the context size is reduced considerably, and provers are able to prove it without any body. During the proof process, I had to create other recursive functions that needed this *_Def procedure in order to use their definition. It has to be instantiated manually, but it is a way to "remind" solvers this axiom. A simple but very useful technique.

#### Manual induction

When trying to prove Prove_Add, I encountered one of the cases where the provers have to reason inductively: a finite sum is defined recursively. To help them prove the postcondition, the conversion is computed incrementally in a loop, and the loop invariant tracks the evolution.

procedure Prove_Add (X, Y, Sum : Integer_255) with
Ghost,
Pre  => (for all J in Sum'Range => Sum (J) = X (J) + Y (J)),
Post => To_Big_Integer (Sum) = To_Big_Integer (X) + To_Big_Integer (Y);
--  Just to remember

procedure Prove_Add (X, Y, Sum : Integer_255) with
Ghost
is
X_255, Y_255, Sum_255 : Big_Integer := Zero;
begin
for J in Sum'Range loop

X_255 := X_255 + (+X (J)) * Conversion_Array (J);
Y_255 := Y_255 + (+Y (J)) * Conversion_Array (J);
Sum_255 := Sum_255 + (+Sum (J)) * Conversion_Array (J);

pragma Loop_Invariant (X_255 = Partial_Conversion (X, J));
pragma Loop_Invariant (Y_255 = Partial_Conversion (Y, J));
pragma Loop_Invariant (Sum_255 = Partial_Conversion (Sum, J));
pragma Loop_Invariant (Partial_Conversion (Sum, J) =
Partial_Conversion (X, J) +
Partial_Conversion (Y, J));
end loop;
end Prove_Add;

What makes this proof inductive is the treatment of Loop_Invariants by SPARK: it will first try to prove the first iteration, as an initialization, and then it will try to prove iteration N knowing that the property is true for N - 1.

Here, the final loop invariant is what we want to prove, because when J = 9, it is equivalent to the postcondition. The other loop invariants and variables are used to compute incrementally the values of partial conversions and facilitate proof.

Fortunately, in this case, provers do not need more guidance to prove the postcondition.  Prove_Multiply also follows this proof scheme, but is much more difficult. You can access its proof following this link.

Another case of inductive reasoning in my project is a lemma, whose proof present a very useful technique when proving algorithms using recursive ghost functions in contracts.

procedure Equal_To_Conversion
(A, B : Integer_Curve25519;
L    : Product_Index_Type)
with
Pre  =>
A'Length > 0
and then A'First = 0
and then B'First = 0
and then B'Last <= A'Last
and then L in B'Range
and then (for all J in 0 .. L => A (J) = B (J)),
Post => Partial_Conversion (A, L) = Partial_Conversion (B, L);

It states that given two Integer_Curve25519, A and B, and a Product_Index_Type, L, if A (0 .. L) = B (0 .. L), then Partial_Conversion (A, L) = Partial_Conversion (B, L). The proof for us is evident because it can be proved by induction over L, but we have to help SPARK a bit, even though it's usually simple.

procedure Equal_To_Conversion
(A, B : Integer_Curve25519;
L    : Product_Index_Type)
is
begin
if L = 0 then
return;                         --  Initialization of lemma
end if;
Equal_To_Conversion (A, B, L - 1); --  Calling lemma for L - 1
end Equal_To_Conversion;

The body of a procedure proved by induction actually looks like induction: first thing to add is initialization, with an if statement ending with a return;. Then, the following code is the general case. We call the same lemma for L - 1, and we add assertions to prove postcondition if necessary. In this case, calling the lemma at L - 1 was sufficient to prove the postcondition.

#### Guide with assertions

Even if the solvers know the relation between Conversion_Array (J + K) and Conversion_Array (J) * Conversion_Array (K), it is hard for them to prove properties requiring non-linear arithmetic reasoning. The following procedure is a nice example:

procedure Split_Product
(Old_Product, Old_X, Product_Conversion : Big_Integer;
X, Y                                   : Integer_255;
J, K                                   : Index_Type)
with
Ghost,
Pre  =>
Old_Product
= Old_X * (+Y)
+ (+X (J))
* (if K = 0
then Zero
else Conversion_Array (J) * Partial_Conversion (Y, K - 1))
and then
Product_Conversion
= Old_Product
+ (+X (J)) * (+Y (K))
* (+(if J mod 2 = 1 and then K mod 2 = 1 then 2 else 1))
* Conversion_Array (J + K),
Post =>
Product_Conversion
= Old_X * (+Y)
+ (+X (J)) * Conversion_Array (J)
* Partial_Conversion (Y, K);

The preconditions imply the postcondition by arithmetic reasoning. With a null body, the procedure is not proved. We can guide provers through assertions, by splitting the proof into two different cases:

procedure Split_Product
(Old_Product, Old_X, Product_Conversion : Big_Integer;
X, Y                                   : Integer_255;
J, K                                   : Index_Type)
is
begin
if J mod 2 = 1 and then K mod 2 = 1 then
pragma Assert (Product_Conversion
= Old_Product
+ (+X (J)) * (+Y (K))
* Conversion_Array (J + K) * (+2));
pragma Assert ((+2) * Conversion_Array (J + K)
= Conversion_Array (J) * Conversion_Array (K));
--  Case where Conversion_Array (J + K) * 2
--             = Conversion_Array (J) * Conversion_Array (K).
else
pragma Assert (Conversion_Array (J + K)
= Conversion_Array (J) * Conversion_Array (K));
--  Other case
end if;
if K > 0 then
pragma Assert (Partial_Conversion (Y, K)
= Partial_Conversion (Y, K - 1)
+ (+Y (K)) * Conversion_Array (K));
--  Definition of Partial_Conversion, needed for proof
end if;
end Split_Product;

What is interesting in this technique is that it shows that to prove something with auto-active proof, you need to understand it yourself first. I proved this lemma by hand on paper, which was not difficult, and then I rewrote my manual proof through assertions. If my manual proof is easier when I split two different cases, so it should be for provers. It allows them to choose the first /good/ steps to the proof.

#### Split into subprograms

Another thing I have noticed, is that provers were really quickly overwhelmed by context size in my project. I had a lot of recursive functions in my contracts, but also quantifiers... I did not hesitate to split proofs into Ghost procedures in order to reduce context size, but also wrap some expressions into functions.

I have 7 subprograms that enable me to prove Prove_Multiply, which is not a lot in my opinion. It also increases readability, which is important if you want other people to read your code.

There is also another method I want to share to overcome this problem. When there is only one assertion to prove, but it requires a lot of guidance, it is possible to put all assertions, including the last one in a begin end block, and the last one has a Assert_And_Cut pragma, just like this:

code ...
--  It is possible to add the keyword declare to declare variables
--  before the block, if it helps for proof.
begin
pragma Assert (one assert needed to prove the last assertion);
pragma Assert (another one);
pragma Assert_And_Cut (Assertion to prove and remember);
end;

Assert_And_Cut asks the provers to prove the property inside them, but also will remove all context that has been created in the begin end; block. It will keep just this assertion, and might help to reduce context.

Sadly, this workaround didn't work for my project, because context was already too big to prove the first assertions. But adding a lot of subprograms also has drawbacks, e.g. you have to write preconditions to the procedures, and this may be more difficult than just writing the block with an Assert_And_Cut. Surely, both are useful in various projects, and I think it's nice to have these methods in mind.

# Takeaways

For statistics only: Add has a 8-line implementation for around 25 lines of Ghost code, to prove it. Multiply has 12 lines of implementation for more than 500 lines of proof. And it's done with auto-active proof, which means that verification was all done automatically! In comparison, verification of TweetNaCl's curve25519 has more than 700 lines of Coq in order to prove Multiply for an implementation of 5 lines, but they have a carry operation to prove on top of the product.

I would say this challenge is at an intermediary level, because it is not difficult to understand the proof of Multiply when you write it down. But it presents several techniques that can apply to various other problems, and this is the main interest of the project to me.

As done in the blogpost by Adam Langley, I tried to prove my project with alt-ergo only, because it was the only prover available for SPARK back in 2014. Even today, alt-ergo alone is unable to prove all code. However, it doesn't make alt-ergo a bad prover, in fact, none of the provers available in SPARK (CVC4, Z3 and Alt-Ergo) are able to prove the entire project on their own. I think it shows that having multiple provers available increases greatly chances of code to be proved.

At the beginning, working on this project was just a way to use my big integers library in proof. But in the end, I believe it is an interesting take on the challenge of verifying elliptic curves functions, especially when projects like this appear for example with fiat-crypto or verification of TweetNaCl's curve25519, and I had a lot of fun experimenting with SPARK to prove properties that usually are badly handled by provers. You can access full proof of my project in this repository.

]]>

This course is geared to software professionals looking for a practical introduction to the Ada language with a focus on embedded systems, including real-time features as well as critical features introduced in Ada 2012. By attending this course you will understand and know how to use Ada for both sequential and concurrent applications, through a combination of live lectures from AdaCore's expert instructors and hands-on workshops using AdaCore's latest GNAT technology. AdaCore will provide an Ada 2012 tool-chain and ARM-based target boards for embedded workshops. No previous experience with Ada is required.

The course will be conducted in English.

Prerequisite: Knowledge of a programming language (Ada 83, C, C++, Java…)

Each participant should come with a computer running Windows.

]]>

A question that our users sometimes ask us is "do you use CodePeer at AdaCore and if so, how?". The answer is yes! and this blog post will hopefully give you some insights into how we are doing it for our own needs.

First, I should note that at AdaCore we are in a special situation since we are both developing lots of Ada code, and at the same time we are also developing and evolving CodePeer itself. One of the consequences is that using CodePeer on our own code has actually two purposes: one is to improve the quality of our code by using an advanced static analyzer, and the other is to eat our own dog food and improve the analyzer itself by finding limitations, sub-optimal usage, etc...

In the past few years, we've gradually added automated runs of CodePeer on many of our Ada code base, in addition to the systematic use of light static analysis performed by the compiler: GNAT already provides many useful built-in checks such as all the static Ada legality rules, as well as many clever warnings fine tuned over the past 25 years and available via the -gnatwa compiler switch and coupled with -gnatwe that transforms warnings into errors, not mentioning also the built-in style checks available via the -gnaty switch.

# GNAT

For GNAT sources (Ada front-end and runtime source code) given that this is a large and complex piece of Ada code (compilers do come with a high level of algorithmic complexity and code recursion!) and that we wanted to favor rapid and regular feedback, we've settled on running CodePeer at level 1 with some extra fine tuning: we've found that some categories of messages didn't generate any extra value for the kind of code found in GNAT, so we've disabled these categories via the --be-messages switch. We've also excluded some files from analysis which were taking a long time compared to other files, for little benefits, via the Excluded_Source_Files project attribute as explained in the Partial Analysis section of the User's Guide.

Given that the default build mechanism of GNAT is based on Makefiles, we've written a separate project file for performing the CodePeer analysis, coupled with a separate simple Makefile that performs the necessary setup phase (automatic generation of some source files in the case of GNAT sources).

After this fine tuning, CodePeer generated a few dozen useful messages that we analyzed and allowed us to fix potential issues, as well as improve the general quality of the code. You'll find a few examples below. The most useful findings on the GNAT code base are related to redundant or dead code and potential access to uninitialized variables when complex conditions are involved, which is often the case in a compiler! CodePeer also detected useful cases of code cleanups in particular related to wrong parameter modes: in out parameters that should be out or inout parameters that should be in out.

Once we've addressed these findings by improving the source code, we ended up in a nice situation where the CodePeer run was completely clean: no new messages found. So we've decided on a strict policy similar to our no warning policy: no CodePeer messages should be left alone moving forward. To ensure that, we've put in place a continuous run that triggers after each commit in the GNAT repository and reports back its findings within half an hour.

One of the most useful categories of potential issues found by CodePeer in our case is related to not always initializing a variable before using it (what CodePeer calls validity checks). For example in gnatchop.adb we had the following code:

   Offset_FD : File_Descriptor;
[...]

exception
when Failure | Types.Terminate_Program =>
Close (Offset_FD);
[...]

CodePeer detected that at line 6 Offset_FD may not have been initialized (precisely because an exception may have been raised before assigning Offset_FD. We fixed this potential issue by explicitly assigning a default value and testing for it:

   Offset_FD : File_Descriptor := Invalid_FD;
[...]

exception
when Failure | Types.Terminate_Program =>
if Offset_FD /= Invalid_FD then
Close (Offset_FD);
end if;
[...]

CodePeer also helped detect suspicious or dead code that should not have been there in the first place. Here is an example of code smell that CodePeer detected in the file get_scos.adb:

procedure Skip_EOL is
C : Character;
begin
loop
Skipc;
C := Nextc;
exit when C /= LF and then C /= CR;

if C = ' ' then
Skip_Spaces;
C := Nextc;
exit when C /= LF and then C /= CR;
end if;
end loop;
end Skip_EOL;

in the code above, CodePeer complained that the test at line 9 is always false because C is either LF or CR. And indeed, if you look closely at line 7, when the code goes past this line, then C will always be either CR or LF, and therefore cannot be a space character. The code was simplified into:

   procedure Skip_EOL is
C : Character;
begin
loop
Skipc;
C := Nextc;
exit when C /= LF and then C /= CR;
end loop;
end Skip_EOL;

# GPS

Analysis of GPS sources with CodePeer is used at AdaCore both for improving the code quality and also to test our integration with the SonarQube tool via our GNATdashboard integration.

For GPS, we are using CodePeer in two modes: one mode where developers can manually run CodePeer on their local set up at level 0. This mode runs in about 3 to 5 minutes to analyze all the GPS sources and runs the first level of checks provided by CodePeer, which is a very useful complement to the compiler warnings and style checks and allows to stay 100% clean of messages after an initial set of code cleanups.

In addition, an automated run is performed nightly on a server using level 1, further tuned in a similar way to what we did for GNAT. Here we have some remaining messages under analysis and we use SonarQube to track and analyze these messages.

Here is an example of code that looked suspicious in GPS sources:

   View : constant Registers_View := Get_Or_Create_View (...);
begin
View.Locked := True;

if View /= null then
[...]
end if;

CodePeer complained at line 5 that the test is always True since View cannot be null at this point. Why? Because at line 3 we are already dereferencing View, so CodePeer knows that after this point either an exception was raised or, if not, View cannot be null anymore.

In this case, we've replaced the test by an explicit assertion since it appears that Get_Or_Create_View can never return null:

   View : constant Registers_View := Get_Or_Create_View (...);
begin
pragma Assert (View /= null);
View.Locked := True;
[...]

# Run Time Certification

As part of a certification project of one of our embedded runtimes for a bare metal target, we ran CodePeer at its highest level (4) in order to detect all potentially cases of a number of vulnerabilities and in particular: validity checks, divide by zero, overflow checks, as well as confirming that the runtime did not contain dead code or unused assignments. CodePeer was run manually and then all messages produced were reviewed and justified, as part of our certification work.

# GNAT LLVM

As part of the GNAT LLVM project (more details on this project if you're curious in a future post!) in its early stages, we ran CodePeer manually on all the GNAT LLVM sources - excluding the sources common to GNAT, already analyzed separately - initially at level 1 and then at level 3, and we concentrated on analyzing all (and only) the potential uninitialized variables (validity checks). In this case we use the same project file used to build GNAT LLVM itself and added the CodePeer settings which basically looked like:

package CodePeer is
for Switches use ("-level", "3", "--be-messages=validity_check", "--no-lal-checkers");
for Excluded_Source_Dirs use ("gnat_src", "obj");
end CodePeer;

which allowed us to perform a number of code cleanups and review more closely the code pointed by CodePeer.

# CodePeer

Last but not least, we also run CodePeer on its own code base! For CodePeer given that this is another large and complex piece of Ada code and we wanted to favor rapid and regular feedback, we've settled on a setting similar to GNAT: level 1 with some extra fine tuning via --be-messages. We also added a few pragma Annotate to both justify some messages - pragma Annotate (CodePeer, False_Positive) - as well as skipping analysis of some subprograms or files where CodePeer was taking too long to analyze, for little benefits (via either the Excluded_Source_Files project attribute or pragma Annotate (CodePeer, Skip_Analysis)).

A CodePeer run is triggered after each change in the repository in a continuous builder and made available to the team within 30 minutes. We've found that in this case the most interesting messages where: validity checks on local variables and out parameters, test always true/false, duplicated code and potential wrong parameter mode.

We also run CodePeer on other code bases in a similar fashion, such as analysis of the SPARK tool sources.

As part of integrating CodePeer in our daily work, we also took the opportunity to improve the documentation and describe many possible workflows corresponding to the various needs of teams wanting to analyze Ada code, with an explanation on how to put these various scenarios in place, check Chapter 5 of the User's Guide if you're curious.

What about you? Do not hesitate to tell us how you are using CodePeer and what are the most useful benefits you are getting by commenting below.

]]>
Ten Years of Using SPARK to Build CubeSat Nano Satellites With Students https://blog.adacore.com/ten-years-of-using-spark-to-build-cubesat-nano-satellites-with-students Fri, 01 Mar 2019 19:31:01 +0000 Peter Chapin https://blog.adacore.com/ten-years-of-using-spark-to-build-cubesat-nano-satellites-with-students

My colleague, Carl Brandon, and I have been running the CubeSat Laboratory at Vermont Technical College (VTC) for over ten years. During that time we have worked with nearly two dozen students on building and programming CubeSat nano satellites. CubeSats are small (usually 10cm cube), easily launched spacecraft that can be outfitted with a variety of cameras, sensing instruments, and communications equipment. Many CubeSats are built by university groups like ours using students at various skill levels in the design and production process.

Students working in the CubeSat Laboratory at VTC have been drawn from various disciplines including computer engineering, electro-mechanical engineering, electrical engineering, and software engineering. VTC offers a masters degree in software engineering, and two of our MSSE students have completed masters projects related to CubeSat flight software. In fact, our current focus is on building a general purpose flight software framework called CubedOS.

Like all spacecraft CubeSats are difficult to service once they are launched. Because of the limited financial resources available to university groups, and because of on-board resource constraints, CubeSats typically don't support after-launch uploading of software updates. This means the software must be fully functional and fault-free, with no possibility of being updated, at the time of launch

Many university CubeSat missions have failed due to software errors. This is not surprising considering that most flight software is written in C, a language that is difficult to use correctly. To mitigate this problem we use the SPARK dialect of Ada in all of our software work. Using the SPARK tools we work toward proving the software free of runtime error, meaning that no runtime exceptions will occur. However, in general we have not attempted to prove functional correctness properties, relying instead on conventional testing for that level of verification.

Although we do have some graduate students working in the CubeSat Laboratory, most of our students are third and fourth year undergraduates. The standard curriculum for VTC's software engineering program includes primarily the Java and C languages. Although I have taught Ada in the classroom, it has only been in specialized classes that a limited number of students take. As a result most of the students who come to the CubeSat Laboratory have no prior knowledge of SPARK or Ada, although they have usually taken several programming courses in other languages.

Normally I arrange to give new students an intensive SPARK training course that spans three or four days, with time for them to do some general exercises. After that I assign the students a relatively simple introductory task involving our codebase so they can get used to the syntax and semantics of SPARK without the distraction of complex programming problems. Our experience has been that good undergraduate students with a reasonable programming background can become usefully productive with SPARK in as little as two weeks. Of course, it takes longer for them to gain the skills and experience to tackle the more difficult problems, but the concern commonly expressed by some that the lack of SPARK skills in a programming team is a barrier to the adoption of SPARK is not validated by our experience

Students are, of course, novice programmers almost by definition. Many of our students are in the process of learning basic software engineering principles such as the importance of requirements, code review, testing, version control, continuous integration, and many other things. Seeing these ideas in the context of our CubeSat work gives them an important measure of realism that can be lacking in traditional courses.

However, because of their general inexperience, and because of the high student turnover rate that is natural in an educational setting, our development process is often far from ideal. Here SPARK has been extremely valuable to us. What we lack in rigor of the development process we make up for in the rigor of the SPARK language and tools. For example, if a student refactors some code, perhaps without adequately communicating with the authors of the adjoining code, the SPARK tools will often find inconsistencies that would otherwise go unnoticed. This has resulted in a much more disciplined progression of the code than one would expect based on the team's overall culture. Moreover the discipline imposed by SPARK and its tools can serve to educate the students about the kinds of issues that can go wrong by following an overly informal approach to development.

For example, one component of CubedOS is a module that sends “tick” messages to other modules on request. These messages are largely intended to trigger slow, non-timing critical housekeeping tasks in the other modules. For flexibility the module supports both one-shot tick messages that occur only once and periodic tick messages. Many “series” of messages can be active at once. The module can send periodic tick messages with different periods to several receivers with additional one-shot tick messages waiting to fire as well.

The module has been reworked many times. At one point it was decided that tick messages should contain a counter that represents the number of such messages that have been sent in the series. Procedure Next_Ticks below is called at a relatively high frequency to scan the list of active series and issue tick messages as appropriate. I asked a new student to add support for the counters to this system as a simple way of getting into our code and helping us to make forward progress. The version produced was something like:

      procedure Next_Ticks is
begin
-- Iterate through the array to see who needs a tick message.
for I in Series_Array'Range loop
declare
Current_Series : Series_Record renames Series_Array(I);
begin
-- If we need to send a tick from this series...
if Current_Series.Is_Used and then Current_Series.Next <= Current_Time then
Route_Message
Request_ID => 0,
Series_ID  => Current_Series.ID,
Count      => Current_Series.Count));

-- Update the current record.
case Current_Series.Kind is
when One_Shot =>
-- TODO: Should we reinitialize the rest of the series record?
Current_Series.Is_Used := False;

when Periodic =>
Current_Series.Count := Current_Series.Count + 1;
Current_Series.Next :=
Current_Series.Next + Current_Series.Interval;
end case;
end if;
end;
end loop;
end Next_Ticks;

The code iterates over an array of Series_Record objects looking for the records that are active and are due to be ticked (Current_Series.Next <= Current_Time). For those records, it invokes the Route_Message procedure on an appropriately filled in Tick Reply message as created by function Tick_Reply_Encode. The student had to modify the information saved for each series to contain a counter, modify the Tick Reply message to contain a count, and update Tick_Reply_Encode to allow for a count parameter. The student also needed to increment the count as needed for periodic messages. This involved relatively minor changes to a variety of areas in the code.

The SPARK tools ensured that the student dealt with initialization issues correctly. But they also found an issue in the code above with surprisingly far reaching consequences. In particular, SPARK could not prove that no runtime error could ever occur on the line

    Current_Series.Count := Current_Series.Count + 1;

Of course, this will raise Constraint_Error when the count overflows the range of the data type used. If periodic tick messages are sent for too long, the exception would eventually be raised, potentially crashing the system. Depending on the rate of tick messages, that might occur in as little as a couple of years, a very realistic amount of time for the duration of a space mission. It was also easy to see how the error could be missed during ordinary testing. A code review might have discovered the problem, but we did code reviews infrequently. On the other hand, SPARK made this problem immediately obvious and led to an extensive discussion about what should be done in the event of the tick counters overflowing.

In November 2013 we launched a low Earth orbiting CubeSat. The launch vehicle contained 13 other university built CubeSats. Most were never heard from. One worked for a few months. Ours worked for two years until it reentered Earth's atmosphere as planned in November 2015. Although the reasons for the other failures are not always clear, software problems were known to be an issue in at least one of them and probably for many others. We believe the success of our mission, particularly in light of the small size and experience of our student team, is directly attributable to the use of SPARK.

]]>

MISRA C is the most widely known coding standard restricting the use of the C programming language for critical software. For good reasons. For one, its focus is entirely on avoiding error-prone programming features of the C programming language rather than on enforcing a particular programming style. In addition, a large majority of rules it defines are checkable automatically (116 rules out of the total 159 guidelines), and many tools are available to enforce those. As a coding standard, MISRA C even goes out of its way to define a consistent sub-language of C, with its own typing rules (called the "essential type model" in MISRA C) to make up for the lack of strong typing in C.

That being said, it's still a long shot to call it a security coding standard for C. Even when taking into account the 14 additional guidelines focusing on security of the "MISRA C:2012 - Amendment 1: Additional security guidelines for MISRA C:2012". MISRA C is first and foremost focusing on software quality, which has obvious consequences for security, but programs in MISRA C remain for the most part vulnerable to the major security vulnerabilities that plague C programs.

In particular, it's hard to state what guarantees are obtained when respecting the MISRA C rules (which means essentially respecting the 116 decidable rules enforced automatically by analysis tools). In order to clarify this, and to present at the same time how guarantees can be obtained using a different programming language, we have written a book available online. Even better, we host on our e-learning website an interactive version of the book where you can compile C or Ada code, and analyze SPARK code, to experiment how a different language with its associated analysis toolset can go beyond what MISRA C allows.

So that, even if MISRA C is the best thing that could happen to C, you can decide if C is really the best thing that could happen to your software.

]]>

Like last year, we've sent a squad of AdaCore engineers to participate in the celebration of Open Source software at FOSDEM. Like last year, we had great interactions with the rest of the Ada and SPARK Community in the Ada devroom on Saturday. You can check the program with videos of all the talks here. This year's edition was particularly diverse, with an academic project from Austria for an autonomous train control in Ada, two talks on Database development and Web development made type-safe with Ada, distributed computing, libraries, C++ binding, concurrency, safe pointers, etc.

We also had a talk in the RISC-V devroom:

And there was a related talk in the Security devroom on the use of SPARK for security:

Hope to see you at FOSDEM next year!

]]>

In Part 1 of this blog post I discussed why I chose to implement this application using the Ada Web Server to serve the computed fractal to a web browser. In this part I will discuss a bit more about the backend of the application, the Ada part.

## Why do we care about performance?

The Ada backend will compute a new image each time one is requested from the front-end. The front-end immediately requests a new image right after it receives the last one it requested. Ideally, this will present to the user as an animation of a Mandelbrot fractal changing colors. If the update is too slow, the animation will look slow and terrible. So we want to minimize the compute time as much as possible. We have a few ways to do that: optimize the computation, and parallelize the computation.

## Parallelizing the Fractal Computation

An interesting feature of the Mandelbrot calculation is that the computation of any pixel is independent of any other pixel. That means we can completely parallelize the computation of each pixel! So if our requested image is 1920x1280 pixels we can spawn 2457600 tasks right? Theoretically, yes we can do that. But it doesn’t necessarily speed up our application compared to, let’s say, 8 or 16 tasks, each of which computes a row or selection of rows. Either way, we know we need to create what’s called a task pool, or a group of tasks that can be queued up as needed to do some calculations. We will create our task pool by creating a task type, which will implement the actual activity that the task will complete, and we will use a Synchronous_Barrier to synchronize all of the tasks back together.

task type Chunk_Task_Type is
pragma Priority (0);
entry Go (Start_Row : Natural;
Stop_Row  : Natural;
Buf       : Stream_Element_Array_Access);

Start_Row : Natural;
Stop_Row  : Natural;
end record;

S_Sync_Obj  : Synchronous_Barrier (Release_Threshold => Task_Pool_Size + 1);

This snippet is from fractal.ads. S_Task_Pool will be our task pool and S_Sync_Obj will be our synchronization object. If you notice, each Chunk_Task_Type in the S_Task_Pool takes an access to a buffer in its Go entry procedure. In our implementation, each task will have an access to the same buffer. Isn’t this a race condition? Shouldn’t we use a protected object?

## The downside of protected object

The answer to both of those questions, is yes. This is a race condition and we should be using a protected object. If you were to run CodePeer on this project, it identifies this as a definite problem. But using a protected object is going to destroy our performance. The reason for this is because under the hood, the protected object will be using locks each time we access data from the buffer. Each lock, unlock, and wait-on-lock call is going to make the animation of our fractal look worse and worse. However, there is a way to get around this race-condition issue. By design, we can guarantee that each task is going to access the buffer from Start_Row to Stop_Row. So, by design we can make sure that each task’s rows don’t overlap another task’s rows thereby avoiding a race condition.

## Parallelization Implementation

Now that we understand the specification of the task pool, let’s look at the implementation.

task body Chunk_Task_Type
is
Start   : Natural;
Stop    : Natural;
Buffer  : Stream_Element_Array_Access;
Notified : Boolean;
begin

loop
accept Go (Start_Row : Natural;
Stop_Row : Natural;
Buf : Stream_Element_Array_Access) do
Start := Start_Row;
Stop := Stop_Row;
Buffer := Buf;
end Go;

for I in Start .. Stop loop

Calculate_Row (Y      => I,
Idx    => Buffer'First +
Stream_Element_Offset ((I - 1) *
Get_Width * Pixel'Size / 8),
Buffer => Buffer);
end loop;
Wait_For_Release (The_Barrier => S_Sync_Obj,
Notified    =>  Notified);
end loop;
end Chunk_Task_Type;

## The bigger the task pool…

Now we come back to the size of the task pool. Based on the implementation above we can chunk the processing up to the granularity of at least the number of rows being requested. So in the case of a 1920x1280 image, we could have 1280 tasks! But we have to ask ourselves, is that going to give us better performance than 8 or 16 tasks? The answer, unfortunately, is probably not. If we create 8 tasks, and we have an 8 core processor, we can assume that some of our tasks are going to execute on different cores in parallel. If we create 1280 tasks and we use the same 8 core processor, we don’t get much more parallelization than with 8 tasks. This is a place where tuning and best judgement will give you the best performance.

## Fixed vs Floating Point

Now that we have the parallelization component, let’s think about optimizing the maths. In most fractal computations floating point complex numbers are used. Based on our knowledge of processors, we can assume that in most cases floating point calculations will be slower than integer calculations. So theoretically, using fixed point numbers might give us better performance. For more information on Ada fixed point types check out the Fixed-point types section of the Introduction to Ada course on learn.adacore.com .

## The Generic Real Type

Because we are going to use the same algorithm for both floating and fixed point math, we can implement the algorithm using a generic type called Real. The Real type is defined in computation_type.ads.

generic
type Real is private;
with function "*" (Left, Right : Real) return Real is <>;
with function "/" (Left, Right : Real) return Real is <>;
with function To_Real (V : Integer) return Real is <>;
with function F_To_Real (V : Float) return Real is <>;
with function To_Integer (V : Real) return Integer is <>;
with function To_Float (V : Real) return Float is <>;
with function Image (V : Real) return String is <>;
with function "+" (Left, Right : Real) return Real is <>;
with function "-" (Left, Right : Real) return Real is <>;
with function ">" (Left, Right : Real) return Boolean is <>;
with function "<" (Left, Right : Real) return Boolean is <>;
with function "<=" (Left, Right : Real) return Boolean is <>;
with function ">=" (Left, Right : Real) return Boolean is <>;
package Computation_Type is

end Computation_Type;

We can then create instances of the Julia_Set package using a floating point and fixed point version of the computation_type package.

type Real_Float is new Float;

function Integer_To_Float (V : Integer) return Real_Float is
(Real_Float (V));

function Float_To_Integer (V : Real_Float) return Integer is
(Natural (V));

function Float_To_Real_Float (V : Float) return Real_Float is
(Real_Float (V));

function Real_Float_To_Float (V : Real_Float) return Float is
(Float (V));

function Float_Image (V : Real_Float) return String is
(V'Img);

D_Small : constant := 2.0 ** (-21);
type Real_Fixed is delta D_Small range -100.0 .. 201.0 - D_Small;

function "*" (Left, Right : Real_Fixed) return Real_Fixed;
pragma Import (Intrinsic, "*");

function "/" (Left, Right : Real_Fixed) return Real_Fixed;
pragma Import (Intrinsic, "/");

function Integer_To_Fixed (V : Integer) return Real_Fixed is
(Real_Fixed (V));

function Float_To_Fixed (V : Float) return Real_Fixed is
(Real_Fixed (V));

function Fixed_To_Float (V : Real_Fixed) return Float is
(Float (V));

function Fixed_To_Integer (V : Real_Fixed) return Integer is
(Natural (V));

function Fixed_Image (V : Real_Fixed) return String is
(V'Img);

package Fixed_Computation is new Computation_Type (Real       => Real_Fixed,
"*"        => Router_Cb."*",
"/"        => Router_Cb."/",
To_Real    => Integer_To_Fixed,
F_To_Real  => Float_To_Fixed,
To_Integer => Fixed_To_Integer,
To_Float   => Fixed_To_Float,
Image      => Fixed_Image);

package Fixed_Julia is new Julia_Set (CT               => Fixed_Computation,
Escape_Threshold => 100.0);

package Fixed_Julia_Fractal is new Fractal (CT              => Fixed_Computation,
Calculate_Pixel => Fixed_Julia.Calculate_Pixel,

package Float_Computation is new Computation_Type (Real       => Real_Float,
To_Real    => Integer_To_Float,
F_To_Real  => Float_To_Real_Float,
To_Integer => Float_To_Integer,
To_Float   => Real_Float_To_Float,
Image      => Float_Image);

package Float_Julia is new Julia_Set (CT               => Float_Computation,
Escape_Threshold => 100.0);

package Float_Julia_Fractal is new Fractal (CT              => Float_Computation,
Calculate_Pixel => Float_Julia.Calculate_Pixel,
Task_Pool_Size  => Task_Pool_Size);

We now have the Julia_Set package with both the floating point and fixed point implementations. The AWS URI router is set up to serve a floating point image if the GET request URI is “/floating_fractal” and a fixed point image is the request URI is “/fixed_fractal”.

## The performance results

Interestingly, fixed point was not unequivocally faster than floating point in every situation. On my 64 bit mac, the floating point was slightly faster. On a 64 bit ARM running QNX the fixed point was faster. Another phenomenon I noticed is that the fixed point was less precise with little performance gain in most cases. When running the fixed point algorithm, you will notice what looks like dust in the image. That is integer overflow or underflow instances on the pixel. Under normal operation, these would manifest as runtime exceptions in the application, but to increase performance, I compiled with those checks turned off.

## Fixed point - the tradeoff

Here’s the takeaway from this project: fixed point math performance is only better if you need lower precision OR a limited range of values. Keep in mind that this is an OR situation. You can either have a precise fixed point number with a small range, or a imprecise fixed point number with a larger range. If you try to have a precise fixed point number with a large range, the underlying integer type that is used will be quite large. If the type requires a 128 bit or larger integer type, then you lose all performance you would have gained by using fixed point in the first place.

In the case of the fractal, the fixed point is useless because we need a high precision with a large range. In this respect, we are stuck with floating point, or weird looking dusty fixed point fractals.

## Conclusion

Although this was an interesting exercise to compare the two types of performances in a pure math situation, it isn’t entirely meaningful. If this was a production application, we would be optimizing the algorithms for the types being used. In this case, the algorithm was very generic and didn’t account for the limited range or precision that would be necessary for an optimized fixed point type. This is why the floating point performance was comparable in most cases to the fixed point with better visual results.

However, there are a few important design paradigms that were interesting to implement. The task pool is a useful concept for any parallelized application such as this. And the polymorphism via generics that we used to create the Real type can be extremely useful to give you greater flexibility to have multiple build or feature configurations.

With Object Oriented programming languages like Ada and C++ it is sometimes tempting to start designing an application like this using base classes and derivation to implement new functionality. In this application that wouldn’t have made sense because we don’t need dynamic polymorphism; everything is known at compile time. So instead, we can use generics to achieve the polymorphism without creating the overhead associated with tagged types and classes.

]]>

The AdaFractal project is exactly what it sounds like; fractals generated using Ada. The code, which can be found here , calculates a Mandelbrot set using a fairly common calculation. Nothing fancy going on here. However, how are we going to display the fractal once its calculated?

## To Graphics Library or not to Graphics Library?

We could use GTKAda, the Ada binding to GTK, to display a window… But that means we are stuck with GTK which really is overkill for displaying our little image, and is only supported on Windows and Linux... And Solaris if you’re into that sort of thing...

What’s more portable and easier to handle than GTK? Well, doing a GUI in web based technology is very easy. With a few lines of HTML, some Javascript, and some simple CSS we can throw together an entire Instagram of fractals... Fractalgram? Does that exist already? If it doesn’t, someone should get on that.

## The Server Solution

The last piece of the puzzle we need is to be able to serve our web files and fractal image to a web browser for rendering. Since we want to ensure portability, we should probably leverage the inherent portability of Ada and use AWS. No, it’s not the same aws that you may use for cloud storage. This AWS is the Ada Web Server which is a small and powerful HTTP server that has support for SOAP/WSDL, Server Push, HTTPS/SSL and client HTTP. If you want to know more about the other aws, check out the blog post Amazon Relies on Formal Methods for the Security of AWS. For the rest of this blog series, let’s assume AWS stands for Ada Web Server.

Because AWS is written entirely in Ada, it is incredibly portable. The current build targets include UNIX-like OS’s, Windows, macOS, and VxWorks for operating systems and ARM, x86, and PPC for target architectures. Just by changing the compiler I have been able to recompile and run this application, with no source code differences, for QNX on 64 bit ARM, macOS, Windows, and Linux on 64 bit x86, and VxWorks 6.9 on 32 bit ARM. That’s pretty portable!

For those who have access to GNATPro on Windows, Linux, or macOS, you can find AWS included with the toolsuite installation. For GNAT Community users, or GNATPro users who would like to compile AWS for a different target, you can build the library from the source located on the AdaCore GitHub.

## Hooking up the pieces

Hooking up the fractal application to AWS was pretty easy. It only required creating a URI router which takes incoming GET requests and dispatches them to the correct worker function. The full router tree can be found in the router_cb.adb file.

function Router (Request : AWS.Status.Data) return AWS.Response.Data
is
URI      : constant String := AWS.Status.URI (Request);
Filename : constant String := "web/" & URI (2 .. URI'Last);
begin
--  Main server access point
if URI = "/" then
--  main page
return AWS.Response.File (AWS.MIME.Text_HTML, "web/html/index.html");

--  Requests a new image from the server
elsif URI = "/fractal" then
return AWS.Response.Build
(Content_Type  => AWS.MIME.Application_Octet_Stream,
Message_Body  => Compute_Image);

--  Serve basic files
elsif AWS.Utils.Is_Regular_File (Filename) then
return AWS.Response.File
(Content_Type => AWS.MIME.Content_Type (Filename),
Filename     => Filename);

else
Put_Line ("Could not find file: " & Filename);

return AWS.Response.Acknowledge
(AWS.Messages.S404,
end if;

end Router;

Here is a sample of a simplified router tree. The Router function is registered as a callback with the AWS framework. It is called whenever a new GET request is received by the server. If the request is for the index page (“/”), we send back the index.html page. If we receive “/fractal” we recompute the fractal image and return the pixel buffer. If the URI is a file, like a css or js file, we simply serve the file. If it’s anything else, we respond with a 404 error.

The AdaFractal app uses a slightly more complex router tree in order to handle input from the user, compute server side zoom of the fractal, and handle computing images of different sizes based on the size of the browser. On the front-end, all of these requests and responses are handled by jQuery. The image that is computed on the backend is an array of pixels using the following layout:

type Pixel is record
Red   : Color;
Green : Color;
Blue  : Color;
Alpha : Color;
end record
with Size => 32;

for Pixel use record
Red at 0 range 0 .. 7;
Green at 1 range 0 .. 7;
Blue at 2 range 0 .. 7;
Alpha at 3 range 0 .. 7;
end record;

type Pixel_Array is array
(Natural range <>) of Pixel
with Pack;

A Pixel_Array with the computed image is sent as the response to the fractal request. The javascript on the front-end overlays that onto a Canvas object which is rendered on the page. No fancy backend image library required!

AWS serves our application to whichever port we specify (the default port is 8080). We can either point our browser to localhost:8080 (127.0.0.1:8080) or if we have a headless device (such as a Raspberry PI, a QNX board, or a VxWorks solution) we can point the browser from another device to the IP address of the headless device port 8080 to view the fractal.

## How about the math and performance?

An interesting side experiment came from this project after trying to increase the performance of the application. After implementing a thread pool to parallelize the math, I decided to try using fixed point types for the math. The implementation of the thread pool and the fixed point versus floating point math performance will be presented in Part 2 of this blog post.

]]>

Over the past several years, a great number of public announcements have been made about companies that are either studying or adopting the Ada and SPARK programming languages. Noteworthy examples include DolbyDensoLASP and Real Heart, as well as the French Security Agency.

Today, NVIDIA, the inventor and market leader of the Graphical Processing Unit (GPU), joins the wave by announcing the selection of SPARK and Ada for its next generation of security-critical GPU firmware running on RISC-V microcontrollers. NVIDIA’s announcement marks a new era of industrial standards for safety- and security-critical software development.

]]>
Proving Memory Operations - A SPARK Journey https://blog.adacore.com/proving-memory-operations-a-spark-journey Tue, 08 Jan 2019 13:33:00 +0000 Quentin Ochem https://blog.adacore.com/proving-memory-operations-a-spark-journey

The promise behind the SPARK language is the ability to formally demonstrate properties in your code regardless of the input values that are supplied - as long as those values satisfy specified constraints. As such, this is quite different from static analysis tools such as our CodePeer or the typical offering available for e.g. the C language, which trade completeness for efficiency in the name of pragmatism. Indeed, the problem they’re trying to solve - finding bugs in existing applications - makes it impossible to be complete. Or, if completeness is achieved, then it is at the cost of massive amount of uncertainties (“false alarms”). SPARK takes a different approach. It requires the programmer to stay within the boundaries of a (relatively large) Ada language subset and to annotate the source code with additional information - at the benefit of being able to be complete (or sound) in the verification of certain properties, and without inundating the programmer with false alarms.

With this blog post, I’m going to explore the extent to which SPARK can fulfill this promise when put in the hands of a software developer with no particular affinity to math or formal methods. To make it more interesting, I’m going to work in the context of a small low-level application to gauge how SPARK is applicable to embedded device level code, with some flavor of cyber-security.

# The problem to solve and its properties

As a prelude, even prior to defining any behavior and any custom property on this code, the SPARK language itself defines a default property, the so-called absence of run-time errors. These include out of bounds access to arrays, out of range assignment to variables, divisions by zero, etc. This is one of the most advanced properties that traditional static analyzers can consider. With SPARK, we’re going to go much further than that, and actually describe functional requirements.

Let’s assume that I’m working on a piece of low level device driver whose job is to set and move the boundaries of two secure areas in the memory: a secure stack and a secure heap area. When set, these areas come with specific registers that prevent non-secure code from reading the contents of any storage within these boundaries. This process is guaranteed by the hardware, and I’m not modeling this part. However, when the boundaries are moved, the data that was in the old stack & heap but not in the new one is now accessible. Unless it is erased, it can be read by non-secure code and thus leak confidential information. Moreover, the code that changes the boundaries is to be as efficient as possible, and I need to ensure that I don’t erase memory still within the secure area.

What I described above, informally, is the full functional behavior of this small piece of code. This could be expressed as a boolean expression that looks roughly like: $dataToErase = (oldStack \cup oldHeap) \cap \overline{(newStack \cup newHeap)}$. Or in other words, the data to erase is everything that was in the previous memory ($$oldStack \cup oldHeap$$) and ($$\cap$$) not in the new memory ($$\overline{(newStack \cup newHeap}$$).

Another way to write the same property is to use a quantifier on every byte of memory, and say that on every byte, if this byte is in the old stack or the old heap but not in the new stack or the new heap, it should be erased: $\forall b \in memory, ((isOldStack(b) \lor isOldHeap (b)) \land \neg (isNewStack (b) \lor isNewHeap (b)) \iff isErased (b))$ Which means that for all the bytes in the memory ($$\forall b \in memory$$) what’s in the old regions ($$isOldStack(b) \lor isOldHeap (b)$$) but not in the new ones ($$\neg (isNewStack (b) \lor isNewHeap (b))$$) has to be erased ($$\iff isErased (b)$$).

We will also need to demonstrate that the heap and the stack are disjoint.

Ideally, we’d like to have SPARK make the link between these two ways of expressing things, as one may be easier to express than the other.

When designing the above properties, it became quite clear that I needed some intermediary library with set operations, in order to be able to express unions, intersections and complement operations. This will come with its own set of lower-level properties to prove and demonstrate.

Let’s now look at how to define the specification for this memory information.

# Memory Specification and Operations

The first step is to define some way to track the properties of the memory - that is whether a specific byte is a byte of heap, of stack, and what kind of other properties they can be associated with (like, has it been erased?). The interesting point here is that the final program executable should not have to worry about these values - not only would it be expensive, it wouldn’t be practical either. We can’t easily store and track the status of every single byte. These properties should only be tracked for the purpose of statically verifying the required behavior.

There is a way in SPARK to designate code to be solely used for the purpose of assertion and formal verification, through so-called “ghost code”. This code will not be compiled to binary unless assertions are activated at run-time. But here we’ll push this one step further by writing ghost code that can’t be compiled in the first place. This restriction imposed on us will allow us to write assertions describing the entire memory, which would be impossible to compile or run.

The first step is to model an address. To be as close as possible to the actual way memory is defined, and to have access to Ada’s bitwise operations, we’re going to use a modular type. It turns out that this introduces a few difficulties: a modular type wraps around, so adding one to the last value goes back to the first one. However, in order to prove absence of run-time errors, we want to demonstrate that we never overflow. To do that, we can define a precondition on the “+” and “-” operators, with an initial attempt to define the relevant preconditions:

function "+" (Left, Right : Address_Type) return Address_Type
is (Left + Right)
with Pre => Address_Type'Last - Left >= Right;

is (Left - Right)
with Pre => Left >= Right;

The preconditions  verify that Left + Right doesn’t exceed Address_Type’Last (for “+”), and that Left - Right is above zero (for “-”). Interestingly, we could have been tempted to write the first precondition the following way:

with Pre => Left + Right <= Address_Type'Last; 

However, with wrap-around semantics inside the precondition itself, this would  always be true.

There’s still a problem in the above code, due to the fact that “+” is actually implemented with “+” itself (hence there’s is an infinite recursive call in the above code). The same goes for “-”. To avoid that, we’re going to introduce a new type “Address_Type_Base” to do the computation without contracts, “Address_Type” being the actual type in use. The full code, together with some size constants (assuming 32 bits), then becomes:

Word_Size    : constant := 32;
Memory_Size  : constant := 2 ** Word_Size;
type Address_Type_Base is mod Memory_Size; -- 0 .. 2**Word_Size - 1

with Pre => Address_Type'Last - Left >= Right;

with Pre => Left >= Right;


Armed with the above types, it’s now time to get started on the modeling of actual memory. We’re going to track the status associated with every byte. Namely, whether a given byte is part of the Stack, part of the Heap, or neither; and whether that byte has been Scrubbed (erased). The prover will reason on the entire memory. However, the status tracking will never exist in the program itself - it will just be too costly to keep all this data at run time. Therefore we’re going to declare all of these entities as Ghost (they are here only for the purpose of contracts and assertions), and we will never enable run-time assertions. The code looks like:

type Byte_Property is record
Stack    : Boolean;
Heap     : Boolean;
Scrubbed : Boolean;
end record
with Ghost;

type Memory_Type is array (Address_Type) of Byte_Property
with Ghost;

Memory : Memory_Type
with Ghost;

Here, Memory is a huge array declared as a global ghost variable. We can’t write executable code with it, but we can write contracts. In particular, we can define a contract for a function that sets the heap between two address values. As a precondition for this function, the lower value has to be below or equal to the upper one. As a postcondition, the property of the memory in the range will be set to Heap. The specification looks like this:

procedure Set_Heap (From, To : Address_Type)
with
Pre => To >= From,
Post => (for all B in Address_Type => (if B in From .. To then Memory (B).Heap else not Memory (B).Heap)),
Global => (In_Out => Memory);


Note that I’m also defining a global here which is how Memory is processed. Here it’s modified, so In_Out.

While the above specification is correct, it’s also incomplete. We’re defining what happens for the Heap property, but not the others. What we expect here is that the rest of the memory is unmodified. Another way to say this is that only the range From .. To is updated, the rest is unchanged. This can be modelled through the ‘Update attribute, and turn the postcondition into:

Post => (for all B in Address_Type =>
(if B in From .. To then Memory (B) = Memory'Old (B)'Update (Heap => True)
else Memory (B) = Memory'Old (B)'Update (Heap => False))),


Literally meaning “The new value of memory equals the old value of memory (Memory’Old) with changes (‘Update) being Heap = True from From to To, and False elsewhere“.

Forgetting to mention what doesn’t change in data is a common mistake when defining contracts. It is also a source of difficulties to prove code, so it’s a good rule of the thumb to always consider what’s unchanged when checking these postconditions. Of course, the only relevant entities are those accessed and modified by the subprogram. Any variable not accessed is by definition unchanged.

Let’s now get to the meat of this requirement. We’re going to develop a function that moves the heap and the stack boundaries, and scrubs all that’s necessary and nothing more. The procedure will set the new heap boundaries between Heap_From .. Heap_To, and stack boundaries between Stack_From and Stack_To, and is defined as such:

procedure Move_Heap_And_Stack
(Heap_From, Heap_To, Stack_From, Stack_To : Address_Type)


Now remember the expression of the requirement from the previous section:

$\forall b \in memory, ((isOldStack(b) \lor isOldHeap (b)) \land \neg (isNewStack (b) \lor isNewHeap (b)) \iff isErased (b))$

This happens to be a form that can be easily expressed as a quantifier, on the Memory array described before:

(for all B in Address_Type =>
(((Memory'Old (B).Stack or Memory'Old (B).Heap)
and then not
(Memory (B).Stack or Memory (B).Heap))
= Memory (B).Scrubbed));


The above is literally the transcription of the property above, checking all bytes B in the address range, and then stating that if the old memory is Stack or Heap and the new memory is not, then the new memory is scrubbed, otherwise not. This contract is going to be the postcondition of Move_Heap_And_Stack. It’s the state of the system after the call.

Interestingly, for this to work properly, we’re going to need also to verify the consistency of the values of Heap_From, Heap_To, Stack_From and Stack_To - namely that the From is below the To and that there’s no overlap between heap and stack. This will be the precondition of the call:

((Stack_From <= Stack_To and then Heap_From <= Heap_To) and then
Heap_From not in Stack_From .. Stack_To and then
Heap_To not in Stack_From .. Stack_To and then
Stack_From not in Heap_From .. Heap_To and then
Stack_To not in Heap_From .. Heap_To);

That’s enough for now at this stage of the demonstration. We have specified the full functional property to be demonstrated. Next step is to implement, and prove this implementation.

# Low-level Set Library

In order to implement the service, it would actually be useful to have a lower level library that implements set of ranges, together with Union, Intersect and Complement functions. This library will be called to determine which regions to erase - it will also be at the basis of the proof. Because this is going to be used at run-time, we need a very efficient implementation. A set of ranges is going to be defined as an ordered disjoint list of ranges. Using a record with discriminant, it looks like:

type Area is record
end record
with Predicate => From <= To;

type Area_Array is array(Natural range <>) of Area;

type Set_Base (Max : Natural) is record
Size  : Natural;
Areas : Area_Array (1 .. Max);
end record;


Note that I call this record Set_Base, and not Set. Bear with me.

You may notice above already a first functional predicate. In the area definition, the fields From and To are described such as From is always below To. This is a check very similar to an Ada range check in terms of where it applies - but on a more complicated property. For Set, I’m also going to express the property I described before, that Areas are ordered (which can can be expressed as the fact that the To value of an element N is below the From value of the element N + 1) and disjoint (the From of element N minus the To of element N + 1 is at least 1). There’s another implicit property to be specified which is that the field Size is below or equal to the Max size of the array. Being able to name and manipulate this specific property has some use, so I’m going to name it in an expression function:

function Is_Consistent (S : Set_Base) return Boolean is
(S.Size <= S.Max and then
((for all I in 1 .. S.Size - 1 =>
S.Areas (I).To < S.Areas (I + 1).From and then
S.Areas (I + 1).From - S.Areas (I).To > 1)));

Now comes the predicate. If I were to write:

type Set_Base (Max : Natural) is record
Size  : Natural;
Areas : Area_Array (1 .. Max);
end record
with Predicate => Is_Consistent (Set_Base);


I would have a recursive predicate call. Predicates are checked in particular when passing parameters, so Is_Consistent would check the predicate of Set_Base, which is a call to Is_Consistent, which would then check the predicate and so on. To avoid this, the predicate is actually applied to a subtype:

subtype Set is Set_Base
with Predicate => Is_Consistent (Set);


As it will later turn out, this property is very fundamental to the ability for proving other properties. At this stage, it’s already nice to see some non-trivial property being expressed, namely that the structure is compact (it doesn't waste space by having consecutive areas that could be merged into one, or said otherwise, all areas are separated by at least one excluded byte).

The formal properties expressed in the next steps will be defined in the form of inclusion - if something is included somewhere then it may or may not be included somewhere else. This inclusion property is expressed as a quantifier over all the ranges. It’s not meant to be run by the program, but only for the purpose of property definition and proof. The function defining that a given byte is included in a set of memory ranges can be expressed as follows:

function Includes (B : Address_Type; S : Set) return Boolean
is (for some I in 1 .. S.Size =>
B in S.Areas (I).From .. S.Areas (I).To)
with Ghost;


Which means that for all the areas in the set S, B is included in the set if B is included in at least one (some) of the areas in the set.

I’m now going to declare a constructor “Create” together with three operators, “or”, “and”, “not” which will respectively implement union, intersection and complement. For each of those, I need to provide some expression of the maximum size of the set before and after the operation, as well as the relationship between what’s included in the input and in the output.

The specification of the function Create is straightforward. It takes a range as input, and creates a set where all elements within this range are contained in the resulting set. This reads:

function Create (From, To : Address_Type) return Set
with Pre => From <= To,
Post => Create'Result.Max = 1
and then Create'Result.Size = 1
and then (for all B in Address_Type =>
Includes (B, Create'Result) = (B in From .. To));


Note that interestingly, the internal implementation of the Set isn’t exposed by the property computing the inclusion. I’m only stating what should be included without giving details on how it should be included. Also note that as in many other places, this postconditon isn’t really something we’d like to execute (that would be possibly a long loop to run for large area creation). However, it’s a good way to model our requirement.

Let’s carry on with the “not”. A quick reasoning shows that at worst, the result of “not” is one area bigger than the input. We’ll need a precondition checking that the Size can indeed be incremented (it does not exceed the last value of the type). The postcondition is that this Size has been potentially incremented, that values that were not in the input Set are now in the resulting one and vice-versa. The operator with its postcondition and precondition reads:

function "not" (S : Set) return Set
with
Pre => Positive'Last - S.Size > 0,
Post =>
(for all B in Address_Type =>
Includes (B, "not"'Result) /= Includes (B, S))
and then "not"'Result.Size <= S.Size + 1;


The same reasoning can be applied to “and” and “or”, which leads to the following specifications:

function "or" (S1, S2 : Set) return Set
with
Pre => Positive'Last - S1.Size - S2.Size >= 0,
Post => "or"'Result.Size <= S1.Size + S2.Size
and Is_Consistent ("or"'Result)
and (for all B in Address_Type =>
(Includes (B, "or"'Result)) =
((Includes (B, S1) or Includes (B, S2))));

function "and" (S1, S2 : Set) return Set
with
Pre => Positive'Last - S1.Size - S2.Size >= 0,
Post => "and"'Result.Size <= S1.Size + S2.Size
and (for all B in Address_Type =>
(Includes (B, "and"'Result)) =
((Includes (B, S1) and Includes (B, S2))));


Of course at this point, one might be tempted to first prove the library and then the user code. And indeed I was tempted and fell for it. However, as this turned out to be a more significant endeavor, let’s start by looking at the user code.

# Move_Heap_And_Stack - the “easy” part

Armed with the Set library, implementing the move function is relatively straightforward. We’re using other services to get the heap and stack boundaries, then creating the set, using the proper operators to create the list to scrub, and then scrubbing pieces of memory one by one, then finally setting the new heap and stack pointers.

procedure Move_Heap_And_Stack
(Heap_From, Heap_To, Stack_From, Stack_To : Address_Type)
is
Prev_Heap_From, Prev_Heap_To,
begin
Get_Stack_Boundaries (Prev_Stack_From, Prev_Stack_To);
Get_Heap_Boundaries (Prev_Heap_From, Prev_Heap_To);

declare
Prev : Set := Create (Prev_Heap_From, Prev_Heap_To) or
Create (Prev_Stack_From, Prev_Stack_To);
Next : Set := Create (Heap_From, Heap_To) or
Create (Stack_From, Stack_To);
To_Scrub : Set := Prev and not Next;
begin
for I in 1 .. To_Scrub.Size loop
Scrub (To_Scrub.Areas (I).From, To_Scrub.Areas (I).To);
end loop;

Set_Stack (Stack_From, Stack_To);
Set_Heap (Heap_From, Heap_To);
end;
end Move_Heap_And_Stack;


Now let’s dive into the proof. As a disclaimer, the proofs we’re going to do for now on are hard. One doesn’t need to go this far to take advantage of SPARK. As a matter of fact, defining requirements formally is already taking good advantage of the technology. Most people only prove data flow or absence of run-time errors which is already a huge win. The next level is some functional key properties. We’re going one level up and entirely proving all the functionalities. The advanced topics that are going to be introduced in this section, such as lemma and loop invariants, are mostly needed for these advanced levels.

The first step is to reset the knowledge we have on the scrubbing state of the memory. Remember that all of the memory state is used to track the status of the memory, but it does not correspond to any real code. To reset the flag, we’re going to create a special ghost procedure, whose sole purpose is to set these flags:

procedure Reset_Scrub
with
Post =>
(for all B in Address_Type =>
Memory (B) = Memory'Old(B)'Update (Scrubbed => False)),
Ghost;


In theory, it is not absolutely necessary to provide an implementation to this procedure if it’s not meant to be compiled. Knowing what it’s supposed to do is good enough here. However, it can be useful to provide a ghost implementation to describe how it would operate. The implementation is straightforward:

procedure Reset_Scrub is
begin
Memory (B).Scrubbed := False;
end loop;
end Reset_Scrub;

We’re going to hit now our first advanced proof topic. While extremely trivial, the above code doesn’t prove, and the reason why it doesn’t is because it has a loop. Loops are something difficult for provers and as of today, they need help to break them down into sequential pieces. While the developer sees a loop, SPARK sees three different pieces of code to prove, connected by a so-called loop invariant which summarizes the behavior of the loop:

procedure Reset_Scrub is
begin
Memory (B).Scrubbed := False;
[loop invariant]


Then:

         [loop invariant]
B := B + 1;
Memory (B).Scrubbed := False;
[loop invariant]

And eventually:

         [loop invariant]
end loop;
end;


The difficulty is now about finding this invariant that is true on all these sections of the code, and that ends up proving the postcondition. To establish those, it’s important to look at what needs to be proven at the end of the loop. Here it would be that the entire array has Scrubbed = False, and that the other fields still have the same value as at the entrance of the loop (expressed using the attribute ‘Loop_Entry):

(for all X in Address_Type =>
Memory (X) = Memory'Loop_Entry(X)'Update (Scrubbed => False))


Then to establish the loop invariant, the question is how much of this is true at each step of the loop. The answer here is that this is true up until the value B. The loop invariant then becomes:

(for all X in Address_Type'First .. B =>
Memory (X) = Memory'Loop_Entry(X)'Update (Scrubbed => False))


Which can be inserted in the code:

procedure Reset_Scrub is
begin
Memory (B).Scrubbed := False;
pragma Loop_Invariant
(for all X in Address_Type'First .. B =>
Memory (X) =
Memory'Loop_Entry(X)'Update (Scrubbed => False))
end loop;
end Reset_Scrub;


Back to the main code. We can now insert a call to Reset_Scrub before performing the actual scrubbing. This will not do anything in the actual executable code, but will tell the prover to consider that the ghost values are reset. Next, we have another loop scrubbing subsection:

for I in 1 .. To_Scrub.Size loop
Scrub (To_Scrub.Areas (I).From, To_Scrub.Areas (I).To);
end loop;


Same as before, the question is here what is the property true at the end of the loop, and breaking this property into what’s true at each iteration. The property true at the end is that everything included in the To_Scrub set has been indeed scrubbed:

(for all B in Address_Type =>
Includes (B, To_Scrub) = Memory (B).Scrubbed);

This gives us the following loop with its invariant:

for I in 1 .. To_Scrub.Size loop
Scrub (To_Scrub.Areas (I).From, To_Scrub.Areas (I).To);

-- everything until the current area is properly scrubbed
pragma Loop_Invariant
(for all B in Address_Type'First .. To_Scrub.Areas (I).To =>
Includes (B, To_Scrub) = Memory (B).Scrubbed);
end loop;

So far, this should be enough. Establishing these loop invariants may look a little bit intimidating at first, but with a little bit of practise they become rapidly straightforward.

Unfortunately, this is not enough.

# Implementing Unproven Interaction with Registers

As this is a low level code, some data will not be proven in the SPARK sense. Take for example the calls that read the memory boundaries:

procedure Get_Heap_Boundaries (From, To : out Address_Type)
with Post => (for all B in Address_Type => (Memory (B).Heap = (B in From .. To)))
and then From <= To;

procedure Get_Stack_Boundaries (From, To : out Address_Type)
with Post => (for all B in Address_Type => (Memory (B).Stack = (B in From .. To)))
and then From <= To;


The values From and To are possibly coming from registers. SPARK wouldn’t necessary be able to make the link with the post condition just because the data is outside of its analysis. In this case, it’s perfectly fine to just tell SPARK about a fact without having to prove it. There are two ways to do this, one is to deactivate SPARK from the entire subprogram:

 procedure Get_Heap_Boundaries
with SPARK_Mode => Off
is
begin
-- code
end Get_Heap_Boundaries;

In this case, SPARK will just assume that the postcondition is correct. The issue is that there’s no SPARK analysis on the entire subprogram, which may be too much. An alternative solution is just to state the fact that the postcondition is true at the end of the subprogram:

procedure Get_Heap_Boundaries
is
begin
-- code

pragma Assume (for all B in Address_Type => (Memory (B).Heap = (B in From .. To)));
end Get_Heap_Boundaries;

In this example, to illustrate the above, the registers will be modeled as global variables read and written from C - which is outside of SPARK analysis as registers would be.

# Move_Heap_And_Stack - the “hard” part

Before diving into what’s next and steering away readers from ever doing proof, let’s step back a little. We’re currently set to doing the hardest level of proof - platinum. That is, fully proving a program’s functional behavior. There is a lot of benefit to take from SPARK prior to reaching this stage. The subset of the language alone provides more analyzable code. Flow analysis allows you to easily spot uninitialized data. Run-time errors such as buffer overflow are relatively easy to clear out, and even simple gold property demonstration is reachable by most software engineers after a little bit of training.

Full functional proof - that is, complex property demonstration - is hard. It is also not usually required. But if this is what you want to do, there’s a fundamental shift of mindset required. As it turns out, it took me a good week to understand that. For a week as I was trying to build a proof from bottom to top, adding various assertions left and right, trying to make things fits the SPARK expectations. To absolutely no results.

And then, in the midst of despair, the apple fell on my head.

The prover is less clever than you think. It’s like a kid learning maths. It’s not going to be able to build a complex demonstration to prove the code by itself. Asking it to prove the code is not the right approach. The right approach is to build the demonstration of the code correctness, step by step, and to prove that this demonstration is correct. This demonstration is a program in itself, with its own subprograms, its own data and control flow, its own engineering and architecture. What the prover is going to do is to prove that the demonstration of the code is correct, and as the demonstration is linked to the code, the code happens to be indirectly proven correct as well.

Now reversing the logic, how could I prove this small loop? One way to work that out is to describe scrubbed and unscrubbed areas one by one:

• On the first area to scrub, if it doesn’t start at the beginning of the memory, everything before the start has not been scrubbed.

• When working on any area I beyond the first one, everything between the previous area I - 1 and the current one has not been scrubbed

• At the end of an iteration, everything beyond the current area I is unscrubbed.

• At the end of the last iteration, everything beyond the last area is unscrubbed

The first step to help the prover is to translate all of these into assertions, and see if these steps are small enough for the demonstration to be proven - and for the demonstration to indeed prove the code. At this stage, it’s not a bad idea to describe the assertion in the loop as loop invariants as we want the information to be carried from one loop iteration to the next. This leads to the following code:

for I in 1 .. To_Scrub.Size loop
Scrub (To_Scrub.Areas (I).From, To_Scrub.Areas (I).To);

pragma Loop_Invariant
(if I > 1 then (for all B in Address_Type range To_Scrub.Areas (I - 1).To + 1 .. To_Scrub.Areas (I).From - 1 =>
not Memory (B).Scrubbed));

pragma Loop_Invariant
then (for all B in Address_Type'First .. To_Scrub.Areas (1).From - 1 => not Memory (B).Scrubbed));

pragma Loop_Invariant
(if To_Scrub.Areas (I).To < Address_Type'Last then
(for all B in To_Scrub.Areas (I).To + 1 .. Address_Type'Last => not Memory (B).Scrubbed));

pragma Loop_Invariant
(for all B in Address_Type'First .. To_Scrub.Areas (I).To => Includes (B, To_Scrub) = Memory (B).Scrubbed);
end loop;

pragma Assert
(if To_Scrub.Size >= 1 and then To_Scrub.Areas (To_Scrub.Size).To < Address_Type'Last then
(for all B in To_Scrub.Areas (To_Scrub.Size).To + 1 .. Address_Type'Last => not Memory (B).Scrubbed));


Results are not too bad at first sight. Out of the 5 assertions, only 2 don’t prove. This may mean that they’re wrong - this may also mean that SPARK needs some more help to prove.

Let’s look at the first one in more details:

pragma Loop_Invariant
then (for all B in Address_Type'First .. To_Scrub.Areas (1).From - 1 => not Memory (B).Scrubbed));


Now if we were putting ourselves in the shoes of SPARK. The prover doesn’t believe that there’s nothing scrubbed before the first element. Why would that be the case? None of these bytes are in the To_Scrub area, right? Let’s check. To investigate this, the technique is to add assertions to verify intermediate steps, pretty much like what you’d do with the debugger. Let’s add an assertion before:

pragma Assert
then (for all B in Address_Type'First .. To_Scrub.Areas (1).From - 1 => not Includes (B, To_Scrub)));


That assertion doesn’t prove. But what would this be true? Recall that we have a consistency check for all sets which is supposed to be true at this point, defined as:

function Is_Consistent (S : Set_Base) return Boolean is
(S.Size <= S.Max and then
((for all I in 1 .. S.Size - 1 =>
S.Areas (I).To < S.Areas (I + 1).From and then
S.Areas (I + 1).From - S.Areas (I).To > 1)));


So looking at the above, if all areas are after the first one, there should be nothing before the first one. If Is_Consistent is true for To_Scrub, then the assertion ought to be true. Yet SPARK doesn’t believe us.

When reaching this kind of situation, it’s good practise to factor out the proof. The idea is to create a place where we say “given only these hypotheses, can you prove this conclusion?”. Sometimes, SPARK is getting lost in the wealth of information available, and just reducing the number of hypothesis to consider to a small number is enough for get it to figure out something.

Interestingly, this activity of factoring out a piece of proof is very close to what you’d do for a regular program. It’s also easier for the developer to understand small pieces of code than a large flat program. The prover is no better than that.

These factored out proofs are typically referred to as lemmas. They are Ghost procedures that prove a postcondition from a minimal precondition. For convention, we’ll call them all Lemma something. The Lemma will look like:

procedure Lemma_Nothing_Before_First (S : Set) with
Ghost,
Pre => Is_Consistent (S),
Post =>
(if S.Size = 0 then (for all B in Address_Type => not Includes (B, S))
elsif S.Areas (1).From > Address_Type'First then
(for all B in Address_Type'First .. S.Areas (1).From - 1 => not Includes (B, S)));


Stating that if S is consistent, then either it’s null (nothing is included) or all elements before the first From are not included.

Now let’s see if reducing the scope of the proof is enough. Let’s just add an empty procedure:

procedure Lemma_Nothing_Before_First (S : Set) is
begin
null;
end Lemma_Nothing_Before_First;


Still no good. That was a good try though. Assuming we believe to be correct here (and we are), the game is now to demonstrate to SPARK how to go from the hypotheses to the conclusion.

To do so we need to take into account one limitation of SPARK - it doesn’t do induction. This has a significant impact on what can be deduced from one part of the hypothesis:

(for all I in 1 .. S.Size - 1 =>
S.Areas (I).To < S.Areas (I + 1).From and then
S.Areas (I + 1).From - S.Areas (I).To > 1)


If all elements “I” are below the “I + 1” element, then I would like to be able to check that all “I” are below all the “I + N” elements after it. This ability to jump from proving a one by one property to a whole set is called induction. This happens to be extremely hard to do for state-of-the-art provers. Here lies our key. We’re going to introduce a new lemma that goes from the same premise, and then demonstrates that it means that all the areas after a given one are greater:

procedure Lemma_Order (S : Set) with
Ghost,
Pre => (for all I in 1 .. S.Size - 1 => S.Areas (I).To < S.Areas (I + 1).From),
Post =>
(for all I in 1 .. S.Size - 1 => (for all J in I + 1 .. S.Size => S.Areas (I).To < S.Areas (J).From));


And we’re going to write the demonstration as a program:

procedure Lemma_Order (S : Set)
is
begin
if S.Size = 0 then
return;
end if;

for I in 1 .. S.Size - 1 loop
for J in I + 1 .. S.Size loop
pragma Assert (S.Areas (J - 1).To < S.Areas (J).From);
pragma Loop_Invariant (for all R in I + 1 .. J => S.Areas (I).To < S.Areas (R).From);
end loop;

pragma Loop_Invariant
((for all R in 1 .. I => (for all T in R + 1 .. S.Size => S.Areas (R).To < S.Areas (T).From)));
end loop;
end Lemma_Order;


As you can see here, for each area I, we’re checking that the area I + [1 .. Size] are indeed greater. This happens to prove trivially with SPARK. We can now prove Lemma_Nothing_Before_First by applying the lemma Lemma_Order. To apply a lemma, we just need to call it as a regular function call. Its hypotheses (precondition) will be checked by SPARK, and its conclusion (postcondition) added to the list of hypotheses available to prove:

procedure Lemma_Nothing_Before_First (S : Set) is
begin
Lemma_Order (S);
end Lemma_Nothing_Before_First;


This now proves trivially. Back to the main loop, applying the lemma Lemma_Nothing_Before_First looks like:

for I in 1 .. To_Scrub.Size loop
Scrub (To_Scrub.Areas (I).From, To_Scrub.Areas (I).To);
Lemma_Nothing_Before_First (To_Scrub);

pragma Loop_Invariant
(if I > 1 then
(for all B in Address_Type range To_Scrub.Areas (I - 1).To + 1 .. To_Scrub.Areas (I).From - 1 => not Memory (B).Scrubbed));

pragma Loop_Invariant
then (for all B in Address_Type'First .. To_Scrub.Areas (1).From - 1 => not Memory (B).Scrubbed));

pragma Loop_Invariant
(if To_Scrub.Areas (I).To < Address_Type'Last then
(for all B in To_Scrub.Areas (I).To + 1 .. Address_Type'Last => not Memory (B).Scrubbed));

pragma Loop_Invariant
(for all B in Address_Type'First .. To_Scrub.Areas (I).To => Includes (B, To_Scrub) = Memory (B).Scrubbed);
end loop;

pragma Assert
(if To_Scrub.Size >= 1 and then To_Scrub.Areas (To_Scrub.Size).To < Address_Type'Last then
(for all B in To_Scrub.Areas (To_Scrub.Size).To + 1 .. Address_Type'Last => not Memory (B).Scrubbed));


And voila! One more loop invariant now proving properly.

# More madness, in a nutshell

At this point, it’s probably not worth diving into all the details of this small subprogram - the code is available here. There’s just more of the same.

The size of this small function is relatively reasonable. Now let’s give some insights on a much more difficult problem: the Set library.

A generous implementation brings about 250 lines of code (it could actually be less if condensed, but let’s start with this). That’s a little bit less than a day of work for implementation and basic testing.

For the so called silver level - that is absence of run-time errors, add maybe around 50 lines of assertions and half a day of work. Not too bad.

For gold level, I decided to prove one key property. Is_Consistent, to be true after each operator. Maybe a day of work was needed for that one. Add another 150 lines of assertions maybe. Still reasonable.

Platinum is about completely proving the functionality of my subprogram. And that proved (pun intended) to be a much much more difficult experience. See this link and this link for other similar experiences. As a disclaimer, I am an experienced Ada developer but had relatively little experience with proof. I also selected a problem relatively hard - the quantified properties and the Set structure are quite different, and proving quantifiers is known to be hard for provers to start with.With that in mind, the solution I came up with spreads over almost a thousand lines of code - and consumed about a week and a half of effort.

I’m also linking here a solution that my colleague Claire Dross came up with. She’s one of our most senior expert in formal proof, and within a day could prove the two most complex operators in about 300 lines of code (her implementation is also more compact than mine).

The above raises a question - is it really worth it? Silver absolutely does - it is difficult to bring a case against spending a little bit more effort in exchange for the absolute certainty of never having a buffer overflow or a range check error. There’s no doubt that in the time I spent in proving this, I would have spent much more in debugging either testing, or worse, errors in the later stages should this library be integrated in an actual product. Gold is also a relatively strong case. The fact that I only select key properties allows only concentrating on relatively easy stuff, and the confidence of the fact that they are enforced and will never have to be tested clearly outweighs the effort.

I also want to point out that the platinum effort is well worth it on the user code in this example. While it looks tedious at first sight, getting these properties right is relatively straightforward, and allows gaining confidence on something that can’t be easily tested; that is, a property on the whole memory.

Now the question remains - is it worth the effort on the Set library, to go from maybe two days of code + proof to around a week and a half?

I can discuss it either way but having to write 700 lines of code to demonstrate to the prover that what I wrote is correct keeps haunting me. Did I really have these 700 lines of reasoning in my head when I developed the code? Did I have confidence in the fact that each of those was logically linked to the next? To be fair, I did find errors in the code when writing those, but the code wasn’t fully tested when I started the proof. Would the test have found all the corner cases? How much time would such a corner case take to debug if found in a year? (see this blog post for some insights on hard to find bugs removed by proof).

Some people who safely certify software against e.g. avionics & railway standards end up writing 10 times more lines of tests than code - all the while just verifying samples of potential data. In that situation, provided that the properties under test can be modelled by SPARK assertions and that they fit what the prover knows how to do, going through this level of effort is a very strong case.

Anything less is open for debate. I have to admit, against all odds, it was a lot of fun and I would personally be looking forward to taking the challenge again. Would my boss allow me to is a different question. It all boils down to the cost of a failure versus the effort to prevent said failure. Being able to make an enlightened decision might be the most valuable outcome of having gone through the effort.

]]>
​Amazon Relies on Formal Methods for the Security of AWS https://blog.adacore.com/amazon-relies-on-formal-methods-for-the-security-of-aws Tue, 23 Oct 2018 13:17:09 +0000 Yannick Moy https://blog.adacore.com/amazon-relies-on-formal-methods-for-the-security-of-aws

Byron Cook, who founded and leads the Automated Reasoning Group at Amazon Web Services (AWS) Security, gave a powerful talk at the Federated Logic Conference in July about how Amazon uses formal methods for ensuring the security of parts of AWS infrastructure. In the past four years, this group of 20+ has progressively hired well-known formal methods experts to face the growing demand inside AWS to develop tools based on formal verification for reasoning about cloud security.

Cook summarizes very succinctly the challenge his team is addressing at 17:25 in the recording: "How does AWS continues to scale quickly and securely?"

A message that Cook hammers out numerous times in his talk is that "soundness is key". See at 25:05 as he explains that some customers value so much the security guarantees that AWS can offer with formal verification that it justified their move to AWS.

Even closer to what we do with SPARK, he talks at 26:42 about source code verification, and has this amaz(on)ing quote: "Proof is an accelerator for adoption. People are moving orders of magnitude workload more because they're like 'in my own data center I don't have proof' but there they have proofs."

In the companion article that was published at the conference, Cook gives more details about what the team has achieved and where they are heading now:

In  2017  alone  the  security  team  used  deductive  theorem provers or model checking tools to reason about cryptographic protocols/systems, hypervisors, boot-loaders/BIOS/firmware, garbage collectors, and network designs.
In many cases we use formal verification tools continuously to ensure that security is implemented as designed. In this scenario, whenever changes and  updates  to  the  service/feature  are  developed,  the  verification  tool is  reexecuted automatically prior to the deployment of the new version.
The customer reaction to features based on formal reasoning tools has been overwhelmingly  positive,  both  anecdotally  as  well  as  quantitatively.  Calls by AWS services to the automated reasoning tools increased by four orders of magnitude in 2017. With the formal verification tools providing the semantic foundation, customers can make stronger universal statements about their policies and networks and be confident that their assumptions are not violated.

While AWS certainly has unique security challenges that justify a strong investment in security, it's not unique in depending on complex software for its operations. What is unique so far is the level of investment at AWS in formal verification as a means to radically eliminate some security problems, both for them and for their customers.

This is certainly an approach we're eager to support with our own investment in the SPARK technology.

]]>

## The challenge

Are you ready to develop a project to the highest levels of safety, security and reliability? If so, Make with Ada is the challenge for you! We’re calling on embedded developers across the globe to build cool embedded applications using the Ada and SPARK programming languages and are offering over $8000 in total prizes. In addition, eligible students will compete for a reward of an Analog Discovery 2 Pro Bundle worth$299.99!

For this competition, we chose not to set a project theme because we want you to be able to demonstrate your inventiveness and to work on a project that motivates you. We’re inviting you to contribute to the world of safe, secure and reliable software by using Ada/SPARK to build something that matters. Learn how to program in Ada and SPARK here – learn.adacore.com.

It's time to Make with Ada. Find out more on Hackster.io.

]]>

This course is geared to software professionals looking for a practical introduction to the Ada language with a focus on embedded systems, including real-time features as well as critical features introduced in Ada 2012. By attending this course you will understand and know how to use Ada for both sequential and concurrent applications, through a combination of live lectures from AdaCore's expert instructors and hands-on workshops using AdaCore's latest GNAT technology. AdaCore will provide an Ada 2012 tool-chain and ARM-based target boards for embedded workshops. No previous experience with Ada is required.

The course will be conducted in English.

Prerequisite: Knowledge of a programming language (Ada 83, C, C++, Java…).

Each participant should come with a computer having at least one USB port and running Windows.

For the full agenda and to register, visit: https://www.adacore.com/public...

### Attachments

]]>

I was looking for a topic for my master thesis in embedded systems engineering when one of my advisor proposed the idea of programming a control system for autonomous trains in Ada. Since I am fascinated by the idea of autonomous vehicles I agreed immediately without knowing Ada.

The obligatory "hello world" was written within a few minutes, so nothing could stop me from writing an application for autonomous trains. I was such a fool. The first days of coding were really hard and it felt like writing a Shakespeare sonnet in a language I had never heard but after a few days and a lot of reading in "Ada 2012" by John Barns I recognized the beauty and efficiency of this language.

Half a year had to go by before the first lines of code for my project could be written. First, I had to define requirements and choose which hardware to use. As a start, my advisor suggested Raspberry Pi 3 as implementation platform. Then I had to dive into the world of model trains to develop my first model railway layout.

A lot of soldering and wiring was required to build such a simple track (see image below). What a perfect job for a software developer, but as a prospective engineer nothing seemed impossible. It took two months until the whole hardware part was set up. Thanks to my colleagues, electronic engineers, we managed to set up everything as required and stabilized the signals until the interference from the turnout drives was eliminated.

The system architecture  (see below) is a bit complicated since a lot of different components had to be taken into account.

To communicate with the trains, turnouts and train position detectors I used an Electronic Solutions Ulm Command Station 50210 (ECoS). It uses the DCC Railcom protocol to communicate with the attached hardware. There are two possibilities to interact with the ECoS -- either an IP interface to send commands  or the touchscreen.

The Messaging Raspberry Pi (MRP) receives all messages of the system and continuously polls  its interfaces to detect changes for example if a button was pressed to call a train to a specific station.  A colleague wrote the software for this part in C#. All messages are converted into a simple format, are transferred to the Control Raspberry Pi and are finally interfaced to the Ada train control software.

A third microcontroller (an STM32F1) triggers the attached LEDs  to display if a part of the track can be passed through. The  respective commands are sent via UART to the STM32F1. The  STM32 software is written in C, but will be rewritten in Ada in the next winter semester by my students during their Ada lessons. If you  allow me a side note: I was overwhelmed by the simplicity of Adas multitasking mechanisms, so that I decided to change the content of my safety programming lectures from C and the MISRA C standard to Ada and SPARK, but that is another story.

A goal of my master thesis was to program an on demand autonomous train system. Therefore, trains have to drive autonomously to the different stations,  turnouts must be changed,  signals must be set and trains must be detected along the track. Each train has its own task storing and processing information: it has to identify its position on the track or the next station to stop at  and to offer information like which message it expects next. One main task receives all the messages from the MRP, analyzes them and passes them to the different train tasks. The main part of the software is to handle and analyze all the  messages, since each component like the signals or turnouts have their own protocols to adhere to.  Thankfully, rendezvous in Ada are very easy to implement so it was fun to write this part of code.

The following video shows a train which is called to a station and drives to a terminus station after  stopping at the station and picking up passengers. As the train starts from the middle station, another train has to drive to the middle station. The middle station has to be  occupied at all times because waiting times must be kept as short as possible. The signals are not working in this video because I started a new project with a bigger track and already detached most of the parts.

]]>

When I bought the TinyFPGA-BX board, I thought it would be an opportunity to play a little bit with FPGA, learn some Verilog or VHDL. But when I discovered that it was possible to have a RISC-V CPU on it, I knew I had to run Ada code on it.

The RISC-V CPU in question is the PicoRV32 from Clifford Wolf. It is written in Verilog and implements the RISC-V 32bits instruction set (IMC extensions). In this blog post I will explain how I added support for this CPU and made an example project.

# Compiler and run-time

More than a year ago I wrote a blog post about building an experimental Ada compiler and running code the first RISC-V micro-controller, the HiFive1. Since then, we released an official support of RISC-V in the Pro and Community editions of GNAT so you don’t even have to build the compiler anymore.

For the run-time, we will start from the existing HiFive1 run-time and change a few things to match the specs of the PicoRV32. As you will see it’s very easy.

Compared to the HiFive1, the PicoRV32 run-time will have

• A different memory map (RAM and flash)

• A different text IO driver (UART)

• Different instruction set extensions

## Memory map

This step is very simple, we just use linker script syntax to declare the two memory areas of the TinyFPGA-BX chip (ICE40):

MEMORY
{
flash (rxai!w) : ORIGIN = 0x00050000, LENGTH = 0x100000
ram (wxa!ri)   : ORIGIN = 0x00000000, LENGTH = 0x002000
}


## Text IO driver

Again this is quite simple, the UART peripheral provided with PicoRV32 only has two registers. We first declare them at their respective addresses:

   UART_CLKDIV : Unsigned_32

UART_Data : Unsigned_32


Then the code just initializes the peripheral by setting the clock divider register to get a 115200 baud rate, and sends characters to the data register:

   procedure Initialize is
begin
UART_CLKDIV := 16_000_000 / 115200;
Initialized := True;
end Initialize;

procedure Put (C : Character) is
begin
UART_Data := Unsigned_32 (Character'Pos (C));
end Put;

## Run-time build script

The last modification is to the Python scripts that create the run-times. To create a new run-time, we add a class that defines different properties like compiler switches or the list of files to include:

class PicoRV32(RiscV32):
@property
def name(self):
return 'picorv32'

@property
def compiler_switches(self):
# The required compiler switches
return ['-march=rv32imc', '-mabi=ilp32']

@property
def has_small_memory(self):
return True

@property
return ['ROM']

def __init__(self):
super(PicoRV32, self).__init__()

# Use the same base linker script as the HiFive1

# Use the same startup code as the HiFive1



## Building the run-time

Once all the changes are made, it is time to build the run-time.

Then run the script inside the bb-runtime repository to create the run-time:

$git clone https://github.com/AdaCore/bb-runtimes$ cd bb-runtimes
$./build_rts.py --bsps-only --output=temp picorv32 Compile the generated run-time: $ gprbuild -P temp/BSPs/zfp_picorv32.gpr


And install:

$gprinstall -p -f -P temp/BSPs/zfp_picorv32.gpr This is it for the run-time. For a complete view of the changes, you can have a look at the commit on GitHub: here. # Example project Now that we can compile Ada code for the PicoRV32, let’s work on an example project. I wanted to include a custom Verilog module, otherwise what’s the point of using an FPGA, right? So I made a peripheral that controls WS2812 RGB LEDs, also known as Neopixels. I won’t explain the details of this module, I would just say that digital logic is difficult for the software engineer that I am :) The hardware/software interface is a memory mapped register that once written to, sends a WS2812 data frame. To control a strip of multiple LEDs, the software just has to write to this register multiple times in a loop. The example software is relatively simple, the Neopixel driver package is generic so that it can handle different lengths of LED strips. The address of the memory mapped register is also a parameter of the generic, so it is possible to have multiple peripherals controlling different LED strips. The memory mapped register is defined using Ada’s representation clauses and Address attribute:  type Pixel is record B, R, G : Unsigned_8; end record with Size => 32, Volatile_Full_Access; for Pixel use record B at 0 range 0 .. 7; R at 0 range 8 .. 15; G at 0 range 16 .. 23; end record; Data_Register : Pixel with Address => Peripheral_Base_Addr;  Then an HSV to RGB conversion function is used to implement different animations on the LED strip, like candle simulation or rainbow effect. And finally there are button inputs to select the animation and the intensity of the light. Both hardware and software sources can be found in this repository. I recommend to follow the TinyFPGA-BX User Guide first to get familiar with the board and how the bootloader works. Feeling inspired and want to start Making with Ada? We have the perfect challenge for you! The Make with Ada competition, hosted by AdaCore, calls on embedded developers across the globe to build cool embedded applications using the Ada and SPARK programming languages and offers over €8000 in total prizes. ]]> AdaCore major sponsor at HIS 2018 https://blog.adacore.com/adacore-major-sponsor-at-his-2018 Wed, 08 Aug 2018 09:51:00 +0000 Pamela Trevino https://blog.adacore.com/adacore-major-sponsor-at-his-2018 We are happy to announce that, AdaCore, alongside Altran and Jaguar Land Rover will be major sponsors of the fifth edition of the renowned High Integrity Software Conference on the 6th November in Bristol! The core themes of the conference this year will be Assured Autonomous Systems, Hardware and Software Architectures, Cyber Security – People & Practice, Languages & Applications. We will address the relationship between software vulnerability, security and safety, and the far-reaching impacts it can have. This year's programme offers a wide and diverse selection of talks, including our keynotes presented by Simon Burton from Bosch and Dr. Andrew Blyth from the University of Wales. Speakers from ANSSI, Assuring Autonomy International Programme - University of York, BAE Systems, CyBOK, Imperial College, IRM, Meridian Mobile, NCSC, RealHeart, Rolls-Royce, University of Edinburgh will be giving technical sessions on building trustworthy software for embedded, connected and infrastructure systems across a range of application domains. For the full agenda and to register: https://www.his-2018.co.uk/ ]]> Learn.adacore.com is here https://blog.adacore.com/learn-adacore-com-is-here Wed, 25 Jul 2018 14:47:00 +0000 Fabien Chouteau https://blog.adacore.com/learn-adacore-com-is-here We are very proud to announce the availability of our new Ada and SPARK learning platform: learn.adacore.com. Following on from the AdaCoreU(niversity) e-learning platform, this website is the next step in AdaCore’s endeavour to provide a better online learning experience for the Ada and SPARK programming languages. This new website is designed for individuals who want to get up and running with Ada/SPARK, and also for teams or teachers looking for training or tutorial material based on Ada/SPARK. In designing the site, we decided to evolve from the video-based approach used for AdaCoreU and instead have created text-based, interactive content to ease the learning experience. In light of this, AdaCoreU will be decommissioned in the coming weeks but the course videos are already available on YouTube and the base material, including slides, remains available on GitHub for people who want to use it for their courses or trainings. The main benefit of a textual approach is the greater flexibility in how you advance through the course. Now you can easily pick and choose from the course material with the opportunity to move on to more advanced sections but then refer back to previous content when needed. We also provide greater interactivity through code snippets embedded in a widget that allows you to compile, run and even prove your code (in the case of SPARK) directly from your Web browser. This allows you to experiment with the tools without having to install them, and tweak the examples to gain a better understanding for what’s allowed and feasible. As for course content production, this new format also allows greater interactivity with the community. The source code for the courses are in reST (reStructuredText) format, which makes it easy to edit and collaborate. It’s also hosted on GitHub so that everyone can suggest fixes or ideas for new content. For the release of learn.adacore.com, you will find two courses, an introduction to Ada and an introduction to SPARK, as well as an ebook “Ada for the C++ or Java Developer”. In the future we plan to add advanced Ada and SPARK courses and also a dedicated course on embedded programming, so watch this space! Learn.adacore.com can also help you get started early and gain a competitive advantage ahead of this year’s Make with Ada competition. ]]> GNAT Community 2018 is here! https://blog.adacore.com/gnat-community-2018 Tue, 26 Jun 2018 13:01:45 +0000 Emma Adby https://blog.adacore.com/gnat-community-2018 Calling all members of the Ada and SPARK community, we are pleased to announce that GNAT Community 2018 is here! adacore.com/download # What’s new? ### BBC micro:bit first class support We decided to adopt the micro:bit as our reference platform for teaching embedded Ada and SPARK. We chose the micro:bit for its worldwide availability, great value for a low price and the included hardware debugger. You can get one at: We did our best to provide a smooth experience when programing the micro:bit in Ada or SPARK. Here is a quick guide on how to get started: • Download and install GNAT arm-elf hosted on your platform: Windows, Linux or MacOS. This package contains the ARM cross compiler as well the required Ada run-times • Download and install GNAT native for your platform: Windows, Linux or MacOS. This package contains the GNAT Programming Studio IDE and an example to run on the micro:bit • Start GNAT Programming Studio • Click on “Create a new Project” • Select the “Scrolling Text” project under “BBC micro:bit” and click Next • Enter the directory where you want the project to be deployed and click Apply • Plug your micro:bit board with a USB cable, and wait for the system to recognize it. This can take a few seconds • Back in GNAT Programming Studio, click on the “flash to board” icon • That’s it! The example provided only uses the LED matrix, for support of more advanced micro:bit features, please have a look at the Ada Drivers Library project. ### RISC-V support The RISC-V open instruction set and its ecosystem are getting more interesting every week, with big names of the tech industry investing in the platform as well as cheap prototyping board being available for makers and hobbyists. After a first experiment last year, AdaCore decided to develop the support of RISC-V in GNAT. In this GNAT Community release we provide support for bare-metal RISC-V 32bit hosted on Linux. In particular, for the HiFive1 board from SiFive. You will also find drivers for this board in the Ada Drivers Library project. ### Find SPARK included in the package by default For the first time in the community release, SPARK is now packaged with the native compiler, making it very easy for everyone to try it out. The three standard provers Alt-Ergo, CVC4 and Z3 are included. ### Windows 64bit is finally here By popular request, we decided to change GNAT Community Windows from 32bit to 64bit. ### Arm-elf hosted on Mac In this release we also add the support for ARM bare-metal hosted on MacOS (previously only Windows and Linux). This includes the support for the BBC micro:bit mentioned above, as well as our usual STM32F4 and STM32F7 boards. Feeling inspired and want to start Making with Ada today? Use the brand new GNAT Community 2018 to get a head start in this years Make with Ada competition! makewithada.org ]]> Security Agency Uses SPARK for Secure USB Key https://blog.adacore.com/security-agency-uses-spark-for-secure-usb-key Mon, 25 Jun 2018 12:51:00 +0000 Yannick Moy https://blog.adacore.com/security-agency-uses-spark-for-secure-usb-key ANSSI, the French national security agency, has published the results of their work since 2014 on designing and implementing an open-hardware & open-source USB key that provides defense-in-depth against vulnerabilities on the USB hardware, architecture, protocol and software stack. In this project called WooKey, Ada and SPARK are key components for the security of the platform. The conference paper (in English), the presentation slides (in French) and a video recording of their presentation (in French) are all available online on the website of the French security conference SSTIC 2018. The complete hardware designs and software code will be available in Q3 2018 in the GitHub project (currently empty). Following the BadUSB vulnerability discloser in 2014 (a USB key can be used to impersonate other devices, and permanently infect all computers it connects to, as well as devices connected to these computers), the only solution to defend against such attacks (to this day) has been to disable USB connections on computers. There are a number of commercial providers of secure USB keys, but their hardware/software stacks are proprietary, so it's not possible to evaluate their level of security. Shortly after the BadUSB disclosure, ANSSI set up an internal project to devise a secure USB key that would restore trust, by being fully open source, based on state-of-the-art practice, yet be affordable for anyone to build/use. The results, four years later, were presented at conference SSTIC 2018 on June 13th. What is interesting is the key role played by the use of safe languages (Ada and SPARK) as well as formal verification (SPARK) to secure the most important services of the EwoK micro-kernel on the USB key, and the combination of these with measures to design a secure software architecture and a secure hardware. They were also quite innovative in their adoption of Ada/SPARK, automatically and progressively replacing units in C with their counterpart in Ada/SPARK in their build system. Something worth noting is that the team discovered Ada/SPARK as part of this project, and managed to prove absence of runtime errors (no buffer overflows!) in their code easily. Arnauld Michelizza from ANSSI will present their work on the EwoK micro-kernel and the software development process they adopted as part of High Integrity Software conference in Bristol on November 6. ]]> How Ada and SPARK Can Increase the Security of Your Software https://blog.adacore.com/how-ada-and-spark-can-increase-the-security-of-your-software Tue, 29 May 2018 12:15:14 +0000 Yannick Moy https://blog.adacore.com/how-ada-and-spark-can-increase-the-security-of-your-software There is a long-standing debate about which phase in the Software Development Life Cycle causes the most bugs: the specification phase or the coding phase? Along with the information on the cost to fix these bugs, answering this question would allow better allocation of QA (Quality Assurance) resources. Furthermore, the cost of bug fixes remains the subject of much debate. A recent study by NIST shows that, in the software industry at large, coding bugs are causing the majority of security issues. They analyzed the provenance of security bugs across all publicly disclosed vulnerabilities in the National Vulnerability Database from 2008 to 2016. They discovered that coding bugs account for two thirds of the total. As they say: The high proportion of implementation errors suggests that little progress has been made in reducing these vulnerabilities that result from simple mistakes, but also that more extensive use of static analysis tools, code reviews, and testing could lead to significant improvement. Our view at AdaCore is that the above list of remedies lacks a critical component for "reducing these vulnerabilities that result from simple mistakes" and probably the most important one: pick a safer programming language! This might not be appropriate for all your software, but why not re-architect your system to isolate the most critical parts and progressively rewrite them with a safer programming language? Better still - design your next system this way in the first place. What safer language to choose? One candidate is Ada, or its SPARK subset. How can they help? We've collected the answers to that question in a booklet to help people and teams who want to use Ada or SPARK for increasing the security of their software. It is freely available here. ]]> Taking on a Challenge in SPARK https://blog.adacore.com/taking-on-a-challenge-in-spark Tue, 08 May 2018 14:01:00 +0000 Johannes Kanig https://blog.adacore.com/taking-on-a-challenge-in-spark Last week, the programmer Hillel posted a challenge (the link points to a partial postmortem of the provided solutions) on Twitter for someone to prove a correct implementation of three small programming problems: Leftpad, Unique, and Fulcrum. This was a good opportunity to compare the SPARK language and its expressiveness and proof power to other systems and paradigms, so I took on the challenge. The good news is that I was able to prove all three solutions and that the SPARK proofs of each complete in no more than 10 seconds. I also believe the Fulcrum solution in particular shows some aspects of SPARK that are especially nice. I will now explain my solutions to each problem, briefly for Leftpad and Unique and in detail for Fulcrum. At the end, I discuss my takeaways from this challenge. ## Leftpad Hillel mentioned that the inclusion of Leftpad into the challenge was kind of a joke. A retracted JavaScript package that implemented Leftpad famously broke thousands of projects back in 2016. Part of the irony was that Leftpad is so simple one shouldn’t depend on a package for this functionality. The specification of Leftpad, according to Hillel, is as follows: Takes a padding character, a string, and a total length, returns the string padded to that length with that character. If length is less than the length of the string, does nothing. It is always helpful to start with translating the specification to a SPARK contract. To distinguish between the two cases (padding required or not), we use contract cases, and arrive at this specification (see the full code in this github repository):  function Left_Pad (S : String; Pad_Char : Character; Len : Natural) return String with Contract_Cases => (Len > S'Length => Left_Pad'Result'Length = Len and then (for all I in Left_Pad'Result'Range => Left_Pad'Result (I) = (if I <= Len - S'Length then Pad_Char else S (I - (Len - S'Length + 1) + S'First))), others => Left_Pad'Result = S); In the case where padding is required, the spec also nicely shows how the result string is composed of both padding chars and chars from the input string. The implementation in SPARK is of course very simple; we can even use an expression function to do it:  function Left_Pad (S : String; Pad_Char : Character; Len : Natural) return String is ((1 .. Len - S'Length => Pad_Char) & S); ## Unique The problem description, as defined by Hillel: Takes a sequence of integers, returns the unique elements of that list. There is no requirement on the ordering of the returned values. An explanation in this blog wouldn’t add anything to the commented code, so I suggest you check out the code for my solution directly. ## Fulcrum The Fulcrum problem was the heart of the challenge. Although the implementation is also just a few lines, quite a lot of specification work is required to get it all proved. The problem description, as defined by Hillel: Given a sequence of integers, returns the index i that minimizes |sum(seq[..i]) - sum(seq[i..])|. Does this in O(n) time and O(n) memory. (Side note: It took me quite a while to notice it, but the above notation seq[..i] apparently means the slice up to and excluding the value at index i. I have taken it instead to mean the slice up to and including the value at index i. Consequently, I used seq[i+1..] for the second slice. This doesn't change the nature or difficulty of the problem.) I’m pleased with the solution I arrived at, so I will present it in detail in this blog post. It has two features that I think none of the other solutions has: • It runs in O(1) space, compared to the O(n) permitted by the problem description and required by the other solutions; • It uses bounded integers and proves absence of overflow; all other solutions use unbounded integers to side-step this issue. Again, our first step is to transform the problem statement into a subprogram specification (please refer to the full solution on github for all the details: there are many comments in the code). For this program, we use arrays to represent sequences of integers, so we first need an array type and a function that can sum all integers in an array. My first try looked like this: type Seq is array (Integer range <>) of Integer; function Sum (S : Seq) return Integer is (if S’Length = 0 then 0 else S (S’First) + Sum (S (S’First + 1 .. S’Last))); So we are just traversing the array via recursive calls, summing up the cells as we go. Assuming an array S of type Seq and an array index I, we could now write the required difference between the left and right sums as follows (assuming S is non-empty, which we assume throughout): abs (Sum (S (S’First .. I)) - Sum (S (I + 1 .. S’Last))) However, there are two problems with this code, and both are specific to how SPARK works. The first problem could be seen as a limitation of SPARK: SPARK allows recursive functions such as Sum, but can’t use them to prove code and specifications that refer to them. This would result in a lot of unproved checks in our code. But my colleague Claire showed me a really nice workaround, which I now present . The idea is to not compute a single sum value, but instead an array of partial sums, where we later can select the partial sum we want. Because such an array can’t be computed in a single expression, we define a new function Sum_Acc as a regular function (not an expression function) with a postcondition:  function Sum_Acc (S : Seq) return Seq with Post => (Sum_Acc'Result'Length = S'Length and then Sum_Acc'Result'First = S'First and then Sum_Acc'Result (S'First) = S (S'First) and then (for all I in S'First + 1 .. S'Last => Sum_Acc'Result (I) = Sum_Acc'Result (I - 1) + S (I))); The idea is that each cell of the result array contains the sum of the input array cells, up to and including the corresponding cell in the input array. For an input array (1,2,3), the function would compute the partial sums (1,3,6). The last value is always the sum of the entire array. In the postcondition, we express that the result array has the same length and bounds as the input array, that the first cell of the result array is always equal to the first cell of the input array, and that each following cell is the sum of the previous cell of the result array, plus the input cell at the same index. The implementation of this Sum_Acc function is straightforward and can be found in the code on github. We also need a Sum_Acc_Rev function which computes the partial sum starting from the end of the array. It is almost the same as Sum_Acc, but the initial (in fact last) value of the array will be zero, owing to the asymmetric definition of the two sums in the initial problem description. You can also find its specification and implementation in the github repository. For our array S and index I, the expression to compute the difference between left and right sum now becomes: abs (Sum_Acc (S) (I) - Sum_Acc_Rev (S) (I)) The second problem is that we are using Integer both as the type of the array cells and the sum result, but Integer is a bounded integer type (32 or 64 bits, depending on your platform), so the sum function will overflow! All other Fulcrum solutions referred to in Hillel’s summary post use unbounded integers, so they avoid this issue. But in many contexts, unbounded integers are unacceptable, because they are unpredictable in space and time requirements and require dynamic memory handling. This is why I decided to increase the difficulty of the Fulcrum problem a little and include proof of absence of overflow. To solve the overflow problem, we need to bound the size of the values to be summed and how many we can sum them. We also need to take into account negative values, so that we also don’t exceed the minimum value. Luckily, range types in SPARK make this simple:  subtype Int is Integer range -1000 .. 1000; subtype Nat is Integer range 1 .. 1000; type Seq is array (Nat range <>) of Int; type Partial_Sums is array (Nat range <>) of Integer; We will use Int as the type for the contents of arrays, and Nat for the array indices (effectively limiting the size of arrays to at most 1000). The sums will still be calculated in Integer, so we need to define a new array type to hold the partial sums (we need to change the return type of the partial sums functions to Partial_Sums to make this work). So if we were to sum the largest possible array with 1000 cells, containing only the the highest (or lowest) value 1000 (or -1000), the absolute value of the sum would not exceed 1 million, which even fits easily into 32 bits. Of course, we could also choose different, and much larger, bounds here. We now can formulate the specification of Find_Fulcrum as follows:  function Find_Fulcrum (S : Seq) return Nat with Pre => S'Length > 0, Post => (Find_Fulcrum'Result in S'Range and then (for all I in S'Range => abs (Sum_Acc (S) (I) - Sum_Acc_Rev (S) (I)) >= abs (Sum_Acc (S) (Find_Fulcrum'Result) - Sum_Acc_Rev (S) (Find_Fulcrum'Result)))); ### The implementation of Fulcrum The Sum_Acc and Sum_Acc_Rev functions we already defined suggest a simple solution to Fulcrum that goes as follows: 1. call Sum_Acc and Sum_Acc_Rev and store the result; 2. compute the index where the difference between the two arrays is the smallest. The problem with this solution is that step (1) takes O(n) space but we promised to deliver a constant space solution! So we need to do something else. In fact, we notice that every program that calls Sum_Acc and Sum_Acc_Rev will be already O(n) in space, so we should never call these functions outside of specifications. The Ghost feature of SPARK lets the compiler check this for us. Types, objects and functions can be marked ghost, and such ghost entities can only be used in specifications (like the postcondition above, or loop invariants and other intermediate assertions), but not in code. This makes sure that we don’t call these functions accidentally, slowing down our code. Marking a function ghost can be done just by adding with Ghost to its declaration. The constant space implementation idea is that if, for some index J - 1, we have the partial sums Left_Sum = Sum_Acc (J - 1) and Right_Sum = Sum_Acc_Rev (J - 1), we can compute the values for the next index J + 1 by simply adding S (J) to Left_Sum and subtracting it from Right_Sum. This simple idea gives us the core of the implementation: for I in S'First + 1 .. S'Last loop Left_Sum := Left_Sum + S (I); Right_Sum := Right_Sum - S (I); if abs (Left_Sum - Right_Sum) < Min then Min := abs (Left_Sum - Right_Sum); Index := I; end if; end loop; return Index; To understand this code, we also need to know that Min holds the current minimal difference between the two sums and Index gives the array index where this minimal difference occurred. In fact, the previous explanations of the code can be expressed quite nicely using this loop invariant (it also holds at the beginning of the loop): pragma Loop_Invariant (Left_Sum = Sum_Acc (S) (I - 1) and then Right_Sum = Sum_Acc_Rev (S) (I - 1) and then Min = abs (Sum_Acc (S) (Index) - Sum_Acc_Rev (S) (Index)) and then (for all K in S'First .. I - 1 => abs (Sum_Acc (S) (K) - Sum_Acc_Rev (S) (K)) >= abs (Sum_Acc (S) (Index) - Sum_Acc_Rev (S) (Index)))); The only part that’s missing is the initial setup, so that the above conditions hold for I = S’First. For the Right_Sum variable, this requires a traversal of the entire array to compute the initial right sum. For this we have written a helper function Sum which is O(n) time and O(1) space. So we end up with this initialization code for Find_Fulcrum: Index : Nat := S'First; Left_Sum : Integer := S (S'First); Right_Sum : Integer := Sum (S); Min : Integer := abs (Left_Sum - Right_Sum); and it can be seen that these initial values establish the loop invariants for the first iteration of the loop (where I = S’First + 1, so I - 1 = S’First). ## Some metrics It took me roughly 15 minutes to come up with the Leftpad proof. I don’t have exact numbers for the two other problems, but I would guess roughly one hour for Unique and 2-3 hours for Fulcrum. Fulcrum is about 110 lines of code, including specifications, but excluding comments and blank lines. An implementation without any contracts would be at 35 lines, so we have a threefold overhead of specification to code … though that’s typical for specification of small algorithmic problems like Fulcrum. All proofs are done automatically, and SPARK verifies each example in less than 10 seconds. ## Some thoughts about the challenge First of all, many thanks to Hillel for starting that challenge and to Adrian Rueegsegger for bringing it to my attention. It was fun to do, and I believe SPARK did reasonably well in this challenge. Hillel’s motivation was to counter exaggerated praise of functional programming (FP) over imperative programming (IP). So he proved these three examples in imperative style and challenged the functional programming community to do the same. In this context, doing the problems in SPARK was probably besides the point, because it’s not a functional language. Going beyond the FP vs IP question asked in the challenge, I think we can learn a lot by looking at the solutions and the discussion around the challenge. This blog post by Neelakantan Krishnaswami argues that the real issue is the combination of language features, in particular aliasing in combination with side effects. If you have both, verification is hard. One approach is to accept this situation and deal with it. In Frama-C, a verification tool for C, it is common to write contracts that separate memory regions. Some tools are based on separation logic, which directly features separating conjunction in the specification language. Both result in quite complex specifications, in my opinion. Or you can change the initial conditions and remove language features. Functional programming removes side effects to make aliasing harmless. This has all kinds of consequences, for example the need for a garbage collector and the inability to use imperative algorithms from the literature. SPARK keeps side effects but removes aliasing, essentially by excluding pointers and a few other language rules. SPARK makes up for the loss of pointers by providing built-in arrays and parameter modes (two very common usages of pointers in C), but shared data structures still remain impossible to write in pure SPARK: one needs to leave the SPARK subset and go to full Ada for that. Rust keeps pointers, but only non-aliasing ones via its borrow checker; however, formal verification for Rust is still in very early stages as far as I know. The SPARK team is also working on support for non-aliasing pointers. Beyond language features, another important question is the applicability of a language to a domain. Formal verification of code matters most where a bug would have a big impact. Such code can be found in embedded devices, for example safety-critical code that makes sure the device is safe to use (think of airplanes, cars, or medical devices). Embedded programming, in particular safety-critical embedded programming, is special because different constraints apply. For example, execution times and memory usage of the program must be very predictable, which excludes languages that have a managed approach to memory (memory usage becomes less predictable) with a GC (often, the GC can kick in at any time, which makes execution times unpredictable). In these domains, functional languages can’t really be applied directly (but see the use of Haskell in sel4). Unbounded integers can’t be used either because of the same issue - memory usage can grow if some computation yields a very large result, and execution time can vary as well. These issues were the main motivation for me to provide a O(1) space solution that uses bounded integers for the Fulcrum problem and proving absence of overflow. Programming languages aren’t everything either. Another important issue is tooling, in particular proof automation. Looking at the functional solutions of Fulcrum (linked from Hillel’s blog post), they contain a lot of manual proofs. The Agda solution is very small despite this fact, though it uses a simple quadratic algorithm; I would love to see a variant that’s linear. I believe that for formal verification to be accepted in industrial projects, most proofs must be able to be completed automatically, though some manual effort is acceptable for a small percentage of the proof obligations. The Dafny and SPARK solutions are the only ones (as far as I could see) that fare well in this regard. Dafny is well-known for its excellent proof automation via Boogie and Z3. SPARK also does very well here, all proofs being fully automatic. ]]> PolyORB now lives on Github https://blog.adacore.com/polyorb-now-lives-on-github Wed, 18 Apr 2018 15:52:00 +0000 Thomas Quinot https://blog.adacore.com/polyorb-now-lives-on-github PolyORB, AdaCore's versatile distribution middleware, now lives on Github. Its new home is https://github.com/AdaCore/polyorb PolyORB is a development toolsuite and a library of runtime components that implements several distribution models, including CORBA and the Ada 95 Distributed Systems Annex. Originally developed as part of academic research at Telecom ParisTech, it became part of the GNAT Pro family in 2003. Since then, it has been used in a number of industrial applications in a wide variety of domains such as: * air traffic flow management * enterprise document management * scientific data processing in particle physics experiments AdaCore has always been committed to involving the user community in the development of PolyORB. Over the past 15 years, many contributions from industrial as well as hobbyist users have been integrated, and community releases were previously made available in conjunction with GNAT GPL. Today we are pleased to further this community engagement and renew our commitment to an open development process by making the PolyORB repository (including full history) available on Github. This will allow users of GNAT GPL to benefit from the latest developments and contribute fixes and improvements. We look forward to seeing your issues and pull requests on this repository! ]]> SPARKZumo Part 2: Integrating the Arduino Build Environment Into GPS https://blog.adacore.com/sparkzumo-part-2-integrating-the-arduino-build-environment-into-gps Wed, 04 Apr 2018 04:00:00 +0000 Rob Tice https://blog.adacore.com/sparkzumo-part-2-integrating-the-arduino-build-environment-into-gps This is part #2 of the SPARKZumo project where we go through how to actually integrate a CCG application in with other source code and how to create GPS plugins to customize features like automating builds and flashing hardware. To read more about the software design of the project visit the other blog post here. # The Build Process At the beginning of our build process we have a few different types of source files that we need to bring together into one binary, Ada/SPARK, C++, C, and an Arduino sketch. During a typical Arduino build, the build system converts the Arduino sketch into valid C++ code, brings in any libraries (user and system) that are included in the sketch, synthesizes a main, compiles and links that all together with the Arduino runtime and selected BSP, and generates the resulting executable binary. The only step we are adding to this process is that we need to run CCG on our SPARK code to generate a C library that we can pass to the Arduino build as a valid Arduino library. The Arduino sketch then pulls the resulting library into the build via an include. **The CCG tool is available as part of a GNATPro product subscription and is not included with the GNAT Community release.** ## Build Steps From the user’s perspective, the steps necessary to build this application are as follows: 1. Run CCG on the SPARK/Ada Code to produce C files and Ada Library Information files, or ali files. For more information on these files, see the GNAT Compilation Model documentation. 2. Copy the resulting C files into a directory structure valid for an Arduino library 1. We will use the lib directory in the main repo to house the generated Arduino library. 3. Run c-gnatls on the ali files to determine which runtime files our application depends on. 4. Copy those runtime files into the Arduino library structure. 5. Make sure our Arduino sketch has included the header files generated by the CCG tool. 6. Run the arduino-builder tool with the appropriate options to tell the tool where our library lives and which board we are compiling for. 1. The arduino-builder tool will use the .build directory in the repo to stage the build 7. Then we can flash the result of the compilation to our target board. That seems like a lot of work to do every time we need to make a change to our software! Since these steps are the same every time, we can automate this. Since we should try to make this as host agnostic as possible, meaning we would like for this to be used on Windows and Linux, we should use a scripting language which is fairly host agnostic. It would also be nice if we could integrate this workflow into GPS so that we can develop our code, prove our code, and build and flash our code without leaving our IDE. It is an Integrated Development Environment after all. ## Configuration Files The arduino-builder program is the command line version of the Arduino IDE. When you build an application with the Arduino IDE it creates a build.options.json file with the options you select from the IDE. These options include the location of any user libraries, the hardware to build for, where the toolchain lives, and where the sketch lives. We can pass the same options to the arduino-builder program or we can pass it the location of a build.options.json file. For this application I put a build.options.json file in the conf directory of the repository. This file should be configured properly for your build system. The best way, I have found, to get this file configured properly is to install the Arduino IDE and build one of the example applications. Then find the generated build.options.json file generated by the IDE and copy that into the conf directory of the repository. You then only need to modify: 1. The “otherLibrariesFolders” to point to the absolute path of the lib folder in the repo. 2. The”sketchLocation” to point at the SPARKZumo.ino file in the repo. The other conf files in the conf directory are there to configure the flash utilities. When flashing the AVR on the Arduino Uno, the avrdude flash utility is used. This application takes the information from the flash.yaml file and the path of the avrdude.conf file to configure the flash command. Avrdude uses this to inform the flashing utility about the target hardware. The HiFive board uses openocd as its flashing utility. The openocd.cfg file has all the necessary configuration information that is passed to the openocd tool for flashing. # The GPS Plugin [DISCLAIMER: This guide assumes you are using version 18.1 or newer of GPS] Under the hood, GPS, or the GNAT Programming Studio, has a combination of Ada, graphical frameworks, and Python scripting utilities. Using the Python plugin interface, it is very easy to add functionality to our GPS environment. For this application we will add some buttons and menu items to automate the process mentioned above. We will only be using a small subset of the power of the Python interface. For a complete guide to what is possible you can visit the Customizing and Extending GPS and Scripting API Reference for GPS sections of the GPS User’s Guide. ## Plugin Installation Locations Depending on your use case you can add Python plugins in a few locations to bring them into your GPS environment. There are already a handful of plugins that come with the GPS installation. You can find the list of these plugins by going to Edit->Preferences and navigating to the Plugin tab (near the bottom of the preferences window on the left sidebar). Because these plugins are included with the installation, they live under the installation directory in <installation directory>/share/gps/plug-ins. If you would like to modify you installation, you can add your plugins here and reload GPS. They will then show up in the plugin list. However, if you reinstall GPS, it will overwrite your plugin! There is a better place to put your plugins such that they won’t disappear when you update your GPS installation. GPS adds a folder to your Home directory which includes all your user defined settings for GPS, such as your color theme, font settings, pretty printer settings, etc. This folder, by default, lives in <user’s home directory>/.gps. If you navigate to this folder you will see a plug-ins folder where you can add your custom plugins. When you update your GPS installation, this folder persists. Depending on your application, there may be an even better place to put your plugin. For this specific application we really only want this added functionality when we have the SPARKzumo project loaded. So ideally, we want the plugin to live in the same folder as the project, and to load only when we load the project. To get this functionality, we can name our plugin <project file name>.ide.py and put it in the same directory as our project. When GPS loads the project, it will also load the plugin. For example, our project file is named zumo.gpr, so our plugin should be called zumo.ide.py. The source for the zumo.ide.py file is located here. ## The Plugin Skeleton When GPS loads our plugin it will call the initialize_project_plugin function. We should implement something like this to create our first button: import GPS import gps_utils class ArduinoWorkflow: def __somefunction(self): # do stuff here def __init__(self): gps_utils.make_interactive( callback=self.__somefunction, category="Build", name="Example", toolbar='main', menu='/Build/Arduino/' + "Example", description="Example") def initialize_project_plugin(): ArduinoWorkflow() This simple class will create a button and a menu item with the text Example. When we click this button or menu item it will callback to our somefunction function. Our actual plugin creates a few buttons and menu items that look like this: ## Task Workflows Now that we have the ability to run some scripts by clicking buttons we are all set! But there’s a problem; when we execute a script from a button, and the script takes some time to perform some actions, GPS hangs waiting for the script to complete. We really should be executing our script asynchronously so that we can still use GPS while we are waiting for the tasks to complete. Python has a nice feature called coroutines which can allow us to run some tasks asynchronously. We can be super fancy and implement these coroutines using generators! Or… ### ProcessWrapper GPS has already done this for us with the task_workflow interface. The task_workflow call wraps our function in a generator and will asynchronously execute parts of our script. We can modify our somefunction function now to look like this: def __somefunction(self, task): task.set_progress(0, 1) try: proc = promises.ProcessWrapper(["script", "arg1", "arg2"], spawn_console="") except: self.__error_exit("Could not launch script.") return ret, output = yield proc.wait_until_terminate() if ret is not 0: self.__error_exit("Script returned an error.") return task.set_progress(1, 1) In this function we are going to execute a script called script and pass 2 arguments to it. We wrap the call to the script in a ProcessWrapper which returns a promise. We then yield on the result. The process will run asynchronously, and the main thread will transfer control back to the main process. When the script is complete, the yield returns the stdout and exit code of the process. We can even feed some information back to the user about the progress of the background processes using the task.set_progress call. This registers the task in the task window in GPS. If we have many tasks to run, we can update the task window after each task to tell the user if we are done yet. ### TargetWrapper The ProcessWrapper interface is nice if we need to run an external script but what if we want to trigger the build or one of the gnat tools? #### Triggering CCG Just for that, there’s another interface: TargetWrapper. To trigger the build tools, we can run something like this: builder = promises.TargetWrapper("Build All") retval = yield builder.wait_on_execute() if retval is not 0: self.__error_exit("Failed to build all.") return With this code, we are triggering the same action as the Build All button or menu item. #### Triggering GNATdoc We can also trigger the other tools within the GNAT suite using the same technique. For example, we can run the GNATdoc tool against our project to generate the project documentation: gnatdoc = promises.TargetWrapper("gnatdoc") retval = yield gnatdoc.wait_on_execute(extra_args=["-P", GPS.Project.root().file().path, "-l"]) if retval is not 0: self.__error_exit("Failed to generate project documentation.") return Here we are calling gnatdoc with the arguments listed in extra_args. This command will generate the project documentation and put it in the directory specified by the Documentation_Dir attribute of the Documentation package in the project file. In this case, I am putting the docs in the docs folder of the repo so that my GitHub repo can serve those via a GitHub Pages website ## Accessing Project Configuration The file that drives the GNAT tools is the GNAT Project file, or the gpr file. This file has all the information necessary for GPS and CCG to process the source files and build the application. We can access all of this information from the plugin as well to inform where to find the source files, where to find the object files, and what build configuration we are using. For example, to access the list of source files for the project we can use the following Python command: GPS.Project.root().sources(). Another important piece of information that we would like to get from the project file is the current value assigned to the “board” scenario variable. This will tell us if we are building for the Arduino target or the HiFive target. This variable will change the build configuration that we pass to arduino-builder and which flash utility we call. We can access this information by using the following command: GPS.Project.root().scenario_variables(). This will return a dictionary of all scenario variables used in the project. We can then access the “board” scenario variable using the typical Python dictionary syntax GPS.Project.root().scenario_variables()[‘board’]. ## Determining Runtime Dependencies Because we are using the Arduino build system to build the output of our CCG tool, we will need to include the runtime dependency files used by our CCG application in the Arduino library directory. To detect which runtime files we are using we can run the c-gnatls command against the ali files generated by the CCG tool. This will output a set of information that we can parse. The output of c-gnatls on one file looks something like this $ c-gnatls -d -a -s obj/geo_filter.ali
types.ads

When we parse this output we will have to make sure we run c-gnatls against all ali files generated by CCG, we will need to strip out any files listed that are actually part of our sources already, and we will need to remove any duplicate dependencies. The c-gnatls tool also lists the Ada versions of the runtime files and not the C versions. So we need to determine the C equivalents and then copy them into our Arduino library folder. The __get_runtime_deps function is responsible for all of this work.

## Generating Lookup Tables

If you had a chance to look at the first blog post in this series, I talked about a bit about code in this application that was used to do some filtering of discrete states using a graph filter. This involved mapping some states onto some physical geometry and sectioning off areas that belonged to different states. The outcome of this was to map each point in a 2D graph to some state using a lookup table.

To generate this lookup table I used a python library called shapely to compute the necessary geometry and map points to states. Originally, I had this as a separate utility sitting in the utils folder in the repo and would copy the output of this program into the geo_filter.ads file by hand. Eventually, I was able to bring this utility into the plugin workflow using a few interesting features of GPS.

### GPS includes pip

Even though GPS has the Python env embedded in it, you can still bring in outside packages using the pip interface. The syntax for installing an external dependency looks something like:

import pip
ret = pip.main(["install"] + dependency)

Where dependency is the thing you are looking to install. In the case of this plugin, I only need the shapely library and am installing that when the GPS plugin is initialized.

The Libadalang library is now included with GPS and can be used inside your plugin. Using the libadalang interface I was able to access the value of user defined named numbers in the Ada files. This was then passed to the shapely application to compute the necessary geometry.

ctx = lal.AnalysisContext()
unit = ctx.get_from_file(file_to_edit)
myVarNode = unit.root.findall(lambda n: n.is_a(lal.NumberDecl) and n.f_ids.text=='my_var')
value = int(myVarNode[0].f_expr.text)

This snippet creates a new Libadalang analysis context, loads the information from a file and searches for a named number declaration called ‘my_var’. The value assigned to ‘my_var’ is then stored in our variable value.

I was then able to access the location where I wanted to put the output of the shapely application using Libadalang:

array_node = unit.root.findall(lambda n: n.is_a(lal.ObjectDecl) and n.f_ids.text=='my_array')
agg_start_line = int(array_node[0].f_default_expr.sloc_range.start.line)
agg_start_col = int(array_node[0].f_default_expr.sloc_range.start.column)
agg_end_line = int(array_node[0].f_default_expr.sloc_range.end.line)
agg_end_col = int(array_node[0].f_default_expr.sloc_range.end.column)

This gave me the line and column number of the start of the array aggregate initializer for the lookup table ‘my_array’.

### Editing Files in GPS from the Plugin

Now that we have the computed lookup table, we could use the typical python file open mechanism to edit the file at the location obtained from Libadalang. But since we are already in GPS, we could just use the GPS.EditorBuffer interface to edit the file. Using the information from our shapely application and the line and column information obtained from Libadalang we can do this:

buf = GPS.EditorBuffer.get(GPS.File(file_to_edit))
agg_start_cursor = buf.at(agg_start_line, agg_start_col)
agg_end_cursor = buf.at(agg_end_line, agg_end_col)
buf.delete(agg_start_cursor, agg_end_cursor)
array_str = "(%s));" % ("(%s" % ("),\n(".join([', '.join([item for item in row]) for row in array])))
buf.insert(agg_start_cursor, array_str[agg_start_col - 1:])

First we open a buffer to the file that we want to edit. Then we create a GPS.Location for the beginning and end of the current array aggregate positions that we obtained from Libadalang. Then we remove the old information in the buffer. We then turn the array we received from our shapely application into a string and insert that into the buffer.

We have just successfully generated some Ada code from our GPS plugin!

# Writing Your Own Python Plugin

Most probably, there is already a plugin that exists in the GPS distribution that does something similar to what you want to do. For this plugin, I used the source for the plugin that enables flashing and debugging of bare-metal STM32 ARM boards. This file can be found in your GPS installation at <install directory>/share/gps/support/ui/board_support.py. You can also see this file on the GPS GitHub repository here.

In most cases, it makes sense to search through the plugins that already exist to get a starting point for your specific application, then you can fill in the blanks from there. You can view the entire source of GPS on AdaCore’s Github repository

That wraps up the overview of the build system for this application. The source for the project can be found here. Feel free to fork this project and create new and interesting things.

Happy Hacking!

]]>

One of the most criticized aspect of the Ada language throughout the years has been its outdated syntax. Fortunately, AdaCore decided to tackle this issue by implementing a new, modern, syntax for Ada.

The major change is the use of curly braces instead of begin/end. Also the following keywords have been shortened:

• return becomes ret
• function becomes fn
• is becomes :
• with becomes include

For instance, the following function:

with Ada.Numerics;

function Fools (X : Float) return Float is
begin
end;

is now written:

include Ada.Numerics;

fn Fools (X : Float) ret Float :
{
};

This modern syntax is a major milestone in the adoption of Ada. John Dorab recently discovered the qualities of Ada:

I have an eye condition that prevents me from reading code without curly braces. Thanks to this new syntax, I can now benefit from the advanced type system, programming by contract, portability, functional safety, [insert more cool Ada features here] of Ada. Also, it looks like most other programming languages, so it must be better.

This new syntax is also a boost in productivity for Ada developers. Mr Fisher testifies:

I write at around 10 lines of code per day. With this new syntax I save up to 30 keystrokes. That’s at a huge increase to my productivity! The code is less readable for debugging, code reviews and maintenance in general, but I write a little bit more of it.

The standardization effort related to this new syntax is expected to start in the coming year and to last a few years. In order to allow early adopters to get their hands on Ada without having to wait for the next standard, we have created a font that allows you to display Ada code with the new syntax:

The font contains other useful ligatures, like displaying the Ada assignment operator “:=” as “=”, and the Ada equality operator “=” as “==”. It was created based on Courier New, using Glyphr Studio and cloudconvert. It’s work in progress, so feel free to extend it! It is attached below.

In the future, we also plan to go beyond a pure syntactic layer with Libadalang. For example, we could emulate the complex type promotions/conversions rules of C by inserting Unchecked_Conversion calls each time the programmer tries to convert a value to an incompatible type. If you have other ideas, please let us know in the comments below!

### Attachments

]]>

There are a lot of DIY CNC projects out there (router, laser, 3D printer, egg drawing, etc.), but I never saw a DIY CNC sandblaster. So I decided to make my own.

# Hardware

The CNC frame is one of those cheap kits that you can get on ebay for instance. Mine was around 200 euros, and it is actually a good value for the price. I built the kit and then replaced the electronic controller with an STM32F469 discovery board and an arduino CNC shield.

For the sandblaster itself, my father and I hacked this simple solution made from a soda bottle and pipes/fittings that you can find in any hardware store.

The sand is falling from the tank in a small tube mostly thanks to gravity. The sand tank still needs to be pressurised to avoid air coming up from the nozzle.

The sandblaster was then mounted to the CNC frame where the engraving spindle is supposed to be, and the sand tank is somewhat fixed above the machine. As you can shortly see on the video, I’m controlling the airflow manually as I didn’t have a solenoid valve to make the machine fully autonomous.

# Software

On the software side I re-used my Ada Gcode controller from a previous project. I still wanted to add something to it, so this time I used a board with a touch screen to create a simple control interface.

# Conclusion

This machine is actually not very practical. The 1.5 litre soda bottle holds barely enough sand to write 3 letters and the dust going everywhere will jam the machine after a few minutes of use. But this was a fun project nonetheless!

PS: Thank you dad for letting me use your workshop once again ;)

]]>

So you want to use SPARK for your next microcontroller project? Great choice! All you need is an Ada 2012 ready compiler and the SPARK tools. But what happens when an Ada 2012 compiler isn’t available for your architecture?

This was the case when I started working on a mini sumo robot based on the Pololu Zumo v1.2

The chassis is complete with independent left and right motors with silicone tracks, and a suite of sensors including an array of infrared reflectance sensors, a buzzer, a 3-axis accelerometer, magnetometer, and gyroscope. The robot’s control interface uses a pin-out and footprint compatible with Arduino Uno-like microcontrollers. This is super convenient, because I can use any Arduino Uno compatible board, plug it into the robot, and be ready to go. But the Arduino Uno is an AVR, and there isn’t a readily available Ada 2012 compiler for AVR… back to the drawing board…

Or…

What if we could still write SPARK code and be able to compile it into some C code. Then use the Arduino compiler to compile and link this code in with the Arduino BSPs and runtimes? This would be ideal because I wouldn’t need to worry about writing a BSP for the board I am using, and I would only have to focus on the application layer. And I can use SPARK! Luckily, AdaCore has a solution for exactly this!

# CCG to the rescue!

The Common Code Generator, or CCG, was developed to solve the issue where an Ada compiler is not available for a specific architecture, but a C compiler is readily available. This is the case for architectures like AVR, PIC, Renesas, and specialized DSPs from companies like TI and Analog Devices. CCG can take your Ada or SPARK code, and “compile” it to a format that the manufacturer’s supplied C compiler can understand. With this technology, we now have all of the benefits of Ada or SPARK on any architecture. **The CCG tool is available as part of a GNATPro product subscription and is not included with the GNAT Community release.**

Note that this is not fundamentally different from what’s already happening in a compiler today. Compilation is essentially a series of translations from one language to the other, each one being used for specific optimization or analysis phase. In the case of GNAT for example the process is as follows:

1. The Ada code is first translated into a simplified version of Ada (called the expanded tree).

2. Then into the gcc tree format which is common to all gcc-supported languages.

3. Then into a format ideal for computing optimizations called gimple.

4. Then into a generic assembly language called RTL.

5. And finally to the actual target assembler.

With CCG, C becomes one of these intermediate languages, with GNAT taking care of the initial compilation steps and a target compiler taking care of the final ones. One important consequence of this is that the C code is not intended to be maintainable or modified. CCG is not a translator from Ada or SPARK to C, it’s a compiler, or maybe half a compiler.

There are some limitations to this though, that are important to know, which are today mostly due to the fact that the technology is very young and targets a subset of Ada. Looking at the limitations more closely, they resemble the limitations imposed by the SPARK language subset on a zero-footprint runtime. I would generally use the zero-footprint runtime in an environment where the BSP and runtime were supplied by a vendor or an RTOS, so this looks like a perfect time to use CCG to develop SPARK code for an Arduino supported board using the Arduino BSP and runtime support.  For a complete list of supported and unsupported constructs you can visit the CCG User’s Guide.

Another benefit I get out of this setup is that I am using the Arduino framework as a hardware abstraction layer. Because I am generating C code and pulling in Arduino library calls, theoretically, I can build my application for many processors without changing my application code. As long as the board is supported by Arduino and is pin compatible with my hardware, my application will run on it!

# Abstracting the Hardware

For this application I looked at targeting two different architectures, the Arduino Uno Rev 3 which has an ATmega328p on board, and a SiFive HiFive1 which has a Freedom E310 on board. These were chosen because they are pin compatible but are massively different from the software perspective. The ATmega328p is a 16 bit AVR and the Freedom E310 is a 32 bit RISC-V. The system word size isn’t even the same! The source code for the project is located here.

In order to abstract the hardware differences away, two steps had to be taken:

1. I used a target configuration file to tell the CCG tool how to represent data sizes during the code generation. By default, CCG assumes word sizes based on the default for the host OS. To compile for the 16 bit AVR, I used the target.atp file located in the base directory to inform the tool about the layout of the hardware. The configuration file looks like this:
2. Bits_BE                       0
Bits_Per_Unit                 8
Bits_Per_Word                16
Bytes_BE                      0
Char_Size                     8
Double_Float_Alignment        0
Double_Scalar_Alignment       0
Double_Size                  32
Float_Size                   32
Float_Words_BE                0
Int_Size                     16
Long_Double_Size             32
Long_Long_Size               64
Long_Size                    32
Maximum_Alignment            16
Max_Unaligned_Field          64
Pointer_Size                 32
Short_Enums                   0
Short_Size                   16
Strict_Alignment              0
System_Allocator_Alignment   16
Wchar_T_Size                 16
Words_BE                      0
float         15  I  32  32
double        15  I  32  32
3. The bsp folder contains all of the differences between the two boards that were necessary to separate out. This is also where the Arduino runtime calls were pulled into the Ada code. For example, in bsp/wire.ads you can see many pragma Import calls used to bring in the Arduino I2C calls located in wire.h.

In order to tell the project which version of these files to use during the compilation, I created a scenario variable in the main project, zumo.gpr

type Board_Type is ("uno", "hifive");
Board : Board_Type := external ("board", "hifive");

Common_Sources := ("src/**", "bsp/");
Target_Sources := "";
case Board is
when "uno" =>
Target_Sources := "bsp/atmega328p";
when "hifive" =>
Target_Sources := "bsp/freedom_e310-G000";
end case;

for Source_Dirs use Common_Sources & Target_Sources;

# Software Design

## Interaction with Arduino Sketch

A typical Arduino application exposes two functions to the developer through the sketch file: setup and loop. The developer would fill in the setup function with all of the code that should be run once at start-up, and then populates the loop function with the actual application programming. During the Arduino compilation, these two functions get pre-processed and wrapped into a main generated by the Arduino runtime. More information about the Arduino build process can be found here.

Because we are using the Arduino runtime we cannot have the actual main entry point for the application in the Ada code (the Arduino pre-processor generates this for us). Instead, we have an Arduino sketch file called SPARKZumo.ino which has the typical Arduino setup() and loop() functions. From setup() we need to initialize the Ada environment by calling the function generated by the Ada binder, sparkzumoinit(). Then, we can call whatever setup sequence we want.

CCG maps Ada package and subprogram namespacing into C-like namespacing, so package.subprogram in Ada would become package__subprogram() in C. The setup function we are calling in the sketch is sparkzumo.setup in Ada, which becomes sparkzumo__setup() after CCG generates the files. The loop function we are calling in the sketch is sparkzumo.workloop in Ada, which becomes sparkzumo__workloop().

## Handling Exceptions

Even though we are generating C code from Ada, the CCG tool can still expand the Ada code to include many of the compiler generated checks associated with Ada code before generating the C code. This is very cool because we still have much of the power of the Ada language even though we are compiling to C.

If any of these checks fail at runtime, the __gnat_last_chance_handler is called. The CCG system supplies a definition for what this function should look like, but leaves the implementation up to the developer. For this application, I put the handler implementation in the sketch file, but am calling back into the Ada code from the sketch to perform more actions (like blink LEDs and shut down the motors). If there is a range check failure, or a buffer overflow, or something similar, my __gnat_last_chance_handler will dump some information to the serial port then call back into the Ada code to  shut down the motors, and flash an LED on an infinite loop. We should never need this mechanism because since we are using SPARK in this application, we should be able to prove that none of these will ever occur!

## Standard.h file

The minimal runtime that does come with the CCG tool can be found in the installation directory under the adalib folder. Here you will find the C versions of the Ada runtimes files that you would typically find in the adainclude directory.

The important file to know about here is the standard.h file. This is the main C header file that will allow you to map Ada to C constructs. For instance, this header file defines the fatptr construct used under Ada arrays and strings, and other integral types like Natural, Positive, and Boolean.

You can and should modify this file to fit within your build environment. For my application, I have included the Arduino.h at the top to bring in the Arduino type system and constructs. Because the Arduino framework defines things like Booleans, I have commented out the versions defined in the standard.h file so that I am consistent with the rest of the Arduino runtime. You can find the edited version of the standard.h file for this project in the src directory.

## Drivers

For the application to interact with all of the sensors available on the robot, we need a layer between the runtime and BSP, and the algorithms. The src/drivers directory contains all of the code necessary to communicate with the sensors and motors. Most of the initial source code for this section was a direct port from the zumo-shield library that was originally written in C++. After porting to Ada, the code was modified to be more robust by refactoring and adding SPARK contracts.

## Algorithms

Even though this is a sumo robot, I decided to start with a line follower algorithm for the proof of concept. The source code for the line follower algorithm can be found in src/algos/line_finder. The algorithm was originally a direct port of the Line Follow example in the zumo-shield examples repo.

The C++ version of this algorithm worked ok but wasn’t really able to handle occasions where the line was lost, or the robot came to a fork, or an intersection. After refactoring and adding SPARK features, I added a detection lookup so that the robot could determine what type of environment the sensors were looking at. The choices are: Lost (meaning no line is found), Online (meaning there’s a single line), Fork (two lines diverge), BranchLeft (left turn), BranchRight (right turn), Perpendicular intersection (make a decision to go left or right), or Unknown (no clue what to do, let’s keep doing what we were doing and see what happens next). After detecting a change in state, the robot would make a decision like turn left, or turn right to follow a new line. If the robot was in a Lost state, it would go into a “re-finding” algorithm where it would start to do progressively larger circles.

This algorithm worked ok as well, but was a little strange. Occasionally, the robot would decide to change direction in the middle of a line, or start to take a branch and turn back the other way. The reason for this was that the robot was detecting spurious changes in state and reacting to them instantaneously. We can call this state noise. In order to minimize this state noise, I added a state low-pass filter using a geometric graph filter.

## The Geometric Graph Filter

If you ask a mathematician they will probably tell you there’s a better way to filter discrete states than this, but this method worked for me! Lets picture mapping 6 points corresponding to the 6 detection states onto a 2d graph, spacing them out evenly along the perimeter of a square. Now, let’s say we have a moving window average with X positions. Each time we get a state reading from the sensors we look up the corresponding coordinate for that state in the graph and add the coordinate to the window. For instance, if we detect a Online state our corresponding coordinate is (15, 15). If we detect a Perpendicular state our coordinate is (-15, 0). And so on. If we average over the window we will end up with a coordinate somewhere in the inside of the square. If we then section off the area of the square into sections, and assign each section to map to the corresponding state, we will then find that our average is sitting in one of those sections that maps to one of our states.

For an example, let’s assume our window is 5 states wide and we have detected the following list of states (BranchLeft, BranchLeft, Online, BranchLeft, Lost). If we map these to coordinates we get the following window: ((-15, 15), (-15, 15), (15, 15), (-15, 15), (-15, -15)). When we average these coordinates in the window we get a point with the coordinates (-9, 9). If we look at our lookup table we can see that this coordinate is in the BranchLeft polygon.

One issue that comes up here is that when the average point moves closer to the center of the graph, there’s high state entropy, meaning our state can change more rapidly and noise has a higher effect. To solve this, we can hold on to the previous calculated state, and if the new calculated state is somewhere in the center of the graph, we throw away the new calculation and pass along the previous calculation. We don’t purge the average window though so that if we get enough of one state, the average point can eventually migrate out to that section of the graph.

To avoid having to calculate this geometry every time we get a new state, I generated a lookup table which maps every point in the polygon to a state. All we have to do is calculate the average in the window and do the lookup at runtime. There are some python scripts that are used to generate most of the src/algos/line_finder/geo_filter.ads file. This script also generates a visual of the graph. For more information on these scripts, see part #2 [COMING SOON!!] of this blog post! One issue that I ran into was that I had to use a very small graph which decreased my ability to filter. This is because the amount of RAM I had available on the Arduino Uno was very small. The larger the graph, the larger the lookup table, the more RAM I needed.

There are a few modifications to this technique that could be done to make it more accurate and more fair. Using a square and only 2 dimensions to map all the states means that the distance between any two states is different than the distance between any other 2 states. For example, it’s easier to switch between BranchLeft and Online than it is to switch between BranchLeft and Fork. For the proof of concept this technique worked well though.

# Future Activity

The code still needs a bit of work to get the IMU sensors up and going. We have another project called the Certyflie which has all of the gimbal calculations to synthesize roll, pitch, and yaw data from an IMU. The Arduino Uno is a bit too weak to perform these calculations properly. One issue is that there is no floating point unit on the AVR. The RISC-V has an FPU and is much more powerful. One option is to add a bluetooth transceiver to the robot and send the IMU data back to a terminal on a laptop for synthesization.

Another issue that came up during this development is that the HiFive board uses level shifters on all of the GPIO lines. The level shifters use internal pull-ups which means that the processor cannot read the reflectance sensors. The reflectance sensor is actually just a capacitor that is discharged when light hits the substrate. So to read the sensor we need to pull the GPIO line high to charge the capacitor then pull it low and read the amount of time it takes to discharge. This will tell us how much light is hitting the sensor. Since the HiFive has the pull ups on the GPIO lines, we can’t pull the line low to read the sensor. Instead we are always charging the sensor. More information about this process can be found on the IR sensor manufacturer’s website under How It Works.

There will be a second post coming soon, which will describe how to actually build this crazy project. There I detail the development of the GPS plugin that I used to build everything and flash the board. As always, the code for the entire project is available here: https://github.com/Robert-Tice/SPARKZumo

Happy Hacking!

]]>
Two Days Dedicated to Sound Static Analysis for Security https://blog.adacore.com/sound-static-analysis-for-security Wed, 14 Mar 2018 15:37:00 +0000 Yannick Moy https://blog.adacore.com/sound-static-analysis-for-security

AdaCore has been working with CEA, Inria and NIST to organize a two-days event dedicated to sound static analysis techniques and tools, and how they are used to increase the security of software-based systems. The program gathers top-notch experts in the field, from industry, government agencies and research institutes, around the three themes of analysis of legacy code, use in new developments and accountable software quality.

The theme "analysis of legacy code" is meant to all those who have to maintain an aging codebase while facing new security threats from the environment. This theme will be introduced by David A. Wheeler whose contributions to security and open source are well known. From the many articles I like from him, I recommend his in-depth analysis of Heartbleed and the State-of-the-Art Resources (SOAR) for Software Vulnerability Detection, Test, and Evaluation, a government official report detailing the tools and techniques for building secure software. David is leading the CII Best Practiced Badge Program to increase security of open source software. The presentations in this theme will touch on analysis of binaries, analysis of C code, analysis of Linux kernel code and analysis of nuclear control systems.

The theme "use in new developments" is meant to all those who start new projects with security requirements. This theme will be introduced by K. Rustan M. Leino, an emblematic researcher in program verification, who has inspired many of the profound changes in the field from his work on ESC/Modula-3 with Greg Nelson, to his work on a comprehensive formal verification environment around the Dafny language, with many others in between: ESC/Java, Spec#, Boogie, Chalice, etc. The presentations in this theme will touch on securing mobile platforms and our critical infrastructure, as well as describing techniques for verifying floating-point programs and more complex requirements.

The theme "accountable software quality" is meant to all those who need to justify the security of their software, either because they have a regulatory oversight or because of commercial/corporate obligations. This theme will be introduced by David Cok, former VP of Technology and Research at GrammaTech, who is well-known for his work on formal verification tools for Java: ESC/Java, ESC/Java2, now OpenJML. The presentations in this theme will touch on what soundness means for static analysis and the demonstrable benefits it brings, the processes around the use of sound static analysis (including the integration between test and proof results), and the various levels of assurance that can be reached.

The event will take place at the National Institute of Standards and Technologies (NIST) at the invitation of researcher Paul Black. Paul co-authored a noticed report last year on Dramatically Reducing Software Vulnerabilities which highlighted sound static analysis as a promising venue. He will introduce the two days of conference with his perspective on the issue.

The workshop will end with tutorials on Frama-C & SPARK given by the technology developers (CEA and AdaCore), so that attendees can have first-hand experience with using the tools. There will be also vendor displays to discuss with techno providers. All in all, a very unique event to attend, especially when you know that, thanks to our sponsors, participation is free! But registration is compulsory. To see the full program and register for the event, see the webpage of the event.

]]>
Secure Software Architectures Based on Genode + SPARK https://blog.adacore.com/secure-software-architectures-based-on-genode-spark Mon, 05 Mar 2018 13:19:00 +0000 Yannick Moy https://blog.adacore.com/secure-software-architectures-based-on-genode-spark

SPARK user Alexander Senier recently presented their use of SPARK for building secure mobile architecture at BOB Konferenz in Germany. What's nice is that they build on the guarantees that SPARK provides at software level, using them to create a secure software architecture based on the Genode operating system framework. At 19:07 in the video he presents 3 interesting architectural designs (policy objects, trusted wrappers, and transient components) that make it possible to build a trustworthy system out of untrustworthy building blocks (like a Web browser or a network stack). Almost as exciting as Alchemy's goal of transforming lead into gold!

Their solution is to design architectures where untrusted components must communicate through trusted ones. They use Genode to enforce the rule that no other communications are allowed and SPARK to make sure that trusted components can really be trusted. You can see an example of an application they build with these technologies at Componolit at 33:37 in the video: a baseband firewall, to protect the Android platform on a mobile device (e.g., your phone) from attacks that get through the baseband processor, which manages radio communications on your mobile.

As the title of the talk says, for security of connected devices in the modern world, we are at a time "when one beyond-mainstream technology is not enough". For more info on what they do, see Componolit website.

]]>

Updated July 2018

The micro:bit is a very small ARM Cortex-M0 board designed by the BBC for computer education. It's fitted with a Nordic nRF51 Bluetooth enabled 32bit ARM microcontroller. At $15 it is one of the cheapest yet most fun piece of kit to start embedded programming. Since the initial release of this blog post we have improved the support of Ada and SPARK on the BBC micro:bit. In GNAT Community Edition 2018, the micro:bit is now directly supported on Linux, Windows and MacOS. This means that the procedure to use the board is greatly simplified: • Download and install GNAT arm-elf hosted on your platform: Windows, Linux or MacOS. This package contains the ARM cross compiler as well the required Ada run-times • Download and install GNAT native for your platform: Windows, Linux or MacOS. This package contains the GNAT Programming Studio IDE and an example to run on the micro:bit • Start GNAT Programming Studio • Click on “Create a new Project” • Select the “Scrolling Text” project under “BBC micro:bit” and click Next • Enter the directory where you want the project to be deployed and click Apply • On Linux only: you might need privileges to access the USB ports without which the flash program will say "No connected boards". To do this on Ubuntu, you can do it by creating (as administrator) the file /etc/udev/rules.d/mbed.rules and add the line: SUBSYSTEM=="usb", ATTR{idVendor}=="0d28", ATTR{idProduct}=="0204", MODE:="666" then restarting the service by doing$ sudo udevadm trigger
• Plug your micro:bit board with a USB cable, and wait for the system to recognize it. This can take a few seconds

• Back in GNAT Programming Studio, click on the “flash to board” icon

• That’s it!

We also improved the micro:bit support and documentation in the Ada Drivers Library project. Follow this link for documented examples of the various features available on the board (text scrolling, buttons, digital in/out, analog in/out, music).

## Conclusion

That’s it, your first Ada program on the Micro:Bit! If you have an issue with this procedure, please tell us in the comments section below.

In the meantime, here is an example of the kind of project that you can do with Ada on the Micro:Bit

]]>
Tokeneer Fully Verified with SPARK 2014 https://blog.adacore.com/tokeneer-fully-verified-with-spark-2014 Fri, 23 Feb 2018 09:49:00 +0000 Yannick Moy https://blog.adacore.com/tokeneer-fully-verified-with-spark-2014

Tokeneer is a software for controlling physical access to a secure enclave by means of a fingerprint sensor. This software was created by Altran (Praxis at the time) in 2003 using the previous generation of SPARK language and tools, as part of a project commissioned by the NSA to investigate the rigorous development of critical software using formal methods.

The project artefacts, including the source code, were released as open source in 2008. Tokeneer was widely recognized as a milestone in industrial formal verification. Original project artefacts, including the original source code in SPARK 2005, are available here.

We recently transitioned this software to SPARK 2014, and it allowed us to go beyond what was possible with the previous SPARK technology. The initial transition by Altran and AdaCore took place in 2013-2014, when we translated all the contracts from SPARK 2005 syntax (stylized comments in the code) to SPARK 2014 syntax (aspects in the code). But at the time we did not invest the time to fully prove the resulting translated code. This is what we have now completed. The resulting code is available on GitHub. It will also be available in future SPARK releases as one of the distributed examples.

## What we did

With a few changes, we went from 234 unproved checks on Tokeneer code (the version originally translated to SPARK 2014), down to 39 unproved but justified checks. The justification is important here: there are limitations to GNATprove analysis, so it is expected that users must sometimes step in and take responsibility for unproved checks.

#### Using predicates to express constraints

Most of the 39 justifications in Tokeneer code are for string concatenations that involve attribute 'Image. GNATprove currently does not know that S'Image(X), for a scalar type S and a variable X of this type, returns a rather small string (as specified in Ada RM), so it issues a possible range check message when concatenating such an image with any other string. We chose to isolate such calls to 'Image in dedicated functions, with suitable predicates on their return type to convey the information about the small string result. Take for example enumeration type ElementT in audittypes.ads. We define a function ElementT_Image which returns a small string starting at 1 and with length less than 20 as follows:

   function ElementT_Image (X : ElementT) return CommonTypes.StringF1L20 is
(ElementT'Image (X));
pragma Annotate (GNATprove, False_Positive,
"range check might fail",
"Image of enums of type ElementT are short strings starting at index 1");
pragma Annotate (GNATprove, False_Positive,
"predicate check might fail",
"Image of enums of type ElementT are short strings starting at index 1");

Note the use of pragma Annotate to justify the range check message and the predicate check message that are generated by GNATprove otherwise. Type StringF1L20 is defined as a subtype of the standard String type with additional constraints expressed as predicates. In fact, we create an intermediate subtype StringF1 of strings that start at index 1 and which are not "super flat", i.e. their last index is at least 0. StringF1L20 inherits from the predicate of StringF1 and adds the constraint that the length of the string is no more than 20:

   subtype StringF1 is String with
Predicate => StringF1'First = 1 and StringF1'Last >= 0;
subtype StringF1L20 is StringF1 with
Predicate => StringF1L20'Last <= 20;

#### Moving query functions to the spec

Another crucial change was to give visibility to client code over query functions used in contracts. Take for example the API in admin.ads. It defines the behavior of the administrator through subprograms whose contracts use query functions RolePresent, IsPresent and IsDoingOp:

   procedure Logout (TheAdmin :    out T)
with Global => null,
and not IsDoingOp (TheAdmin);

The issue was that these query functions, while conveniently abstracting away the details of what it means for the administrator to be present, or to be doing an operation, were defined in the body of package Admin, inside file admin.adb. As a result, the proof of client code of Admin had to consider these calls as blackboxes, which resulted in many unprovable checks. The fix here consisted in moving the definition for the query functions inside the private part of the spec file admin.ads: this way, client code still does not see their implementation, but GNATprove can use these expression functions in proving client code.

   function RolePresent (TheAdmin : T) return PrivTypes.PrivilegeT is

function IsPresent (TheAdmin : T) return Boolean is

function IsDoingOp (TheAdmin : T) return Boolean is
(TheAdmin.CurrentOp in OpT);

#### Using type invariants to enforce global invariants

Some global properties on the version in SPARK 2005 were justified manually, like the global invariant maintained in package Auditlog over the global variables encoding the state of the files used to log operations: CurrentLogFile, NumberLogEntries, UsedLogFiles, LogFileEntries. Here is the text for this justification:

-- Proof Review file for

-- VC 6
-- C1:    fld_numberlogentries(state) = (fld_length(fld_usedlogfiles(state)) - 1)
--           * 1024 + element(fld_logfileentries(state), [fld_currentlogfile(state)
--           ]) .
-- C1 is a package state invariant.
-- proof shows that all public routines that modify NumberLogEntries, UsedLogFiles.Length,
-- CurrentLogFile or LogFileEntries(CurrentLogFile) maintain this invariant.
-- This invariant has not been propogated to the specification since it would unecessarily
-- complicate proof of compenents that use the facilities from this package.

We can do better in SPARK 2014, by expressing this property as a type invariant. This requires all four variables to become components of the same record type, so that a single global variable LogFiles replaces them:

   type LogFileStateT is record
CurrentLogFile   : LogFileIndexT  := 1;
NumberLogEntries : LogEntryCountT := 0;
UsedLogFiles     : LogFileListT   :=
LogFileListT'(List   => (others => 1),
LastI  => 1,
Length => 1);
LogFileEntries   : LogFileEntryT  := (others => 0);
end record
with Type_Invariant =>
Valid_NumberLogEntries
(CurrentLogFile, NumberLogEntries, UsedLogFiles, LogFileEntries);

LogFiles         : LogFilesT := LogFilesT'(others => File.NullFile)
with Part_Of => FileState;

With this change, all public subprograms updating the state of log files can now assume the invariant holds on entry (it is checked by GNATprove on every call) and must restore it on exit (it is checked by GNATprove when returning from the subprogram). Locally defined subprograms need not obey this constraint however, which is exactly what is needed here. One subtlety is that some of these local subprograms where accessing the state of log files as global variables. If we had kept LogFiles as a global variable, SPARK rules would have required that its invariant is checked on entry and exit from this subprograms. Instead, we changed the signature of these local subprograms to take LogFiles as an additional parameter, on which the invariant needs not hold.

#### Other transformations on contracts

A few other transformations were needed to make contracts provable with SPARK 2014. In particular, it was necessary to change a number of "and" logical operations into their short-circuit version "and then". See for example this part of the precondition of Processing in tismain.adb:

       (if (Admin.IsDoingOp(TheAdmin) and
then
Admin.RolePresent(TheAdmin) = PrivTypes.Guard)

The issue was that calling TheCurrentOp requires that IsDoingOp holds:

   function TheCurrentOp (TheAdmin : T) return OpT
with Global => null,
Pre    => IsDoingOp (TheAdmin);

Since "and" logical operation evaluates both its operands, TheCurrentOp will also be called in contexts where IsDoingOp does not hold, thus leading to a precondition failure. The fix is simply to use the short-circuit equivalent:

       (if (Admin.IsDoingOp(TheAdmin) and then
then
Admin.RolePresent(TheAdmin) = PrivTypes.Guard)

We also added a few loop invariants that were missing.

You can read the original Tokeneer report for a description of the security properties that were provably enforced through formal verification.

To demonstrate that indeed formal verification brings assurance that some security vulnerabilities are not present, we have seeded four vulnerabilities in the code, and reanalyzed it. The analysis of GNATprove (either through flow analysis or proof) detected all four: an information leak, a back door, a buffer overflow and an implementation flaw. You can see that in action in this short 4-minutes video.

]]>

This blog post is part two of a tutorial based on the OpenGLAda project and will cover implementation details such as a type system for interfacing with C, error handling, memory management, and loading functions.

If you haven't read part one I encourage you to do so. It can be found here

# Wrapping Types

As part of the binding process we noted in the previous blog post that we will need to translate typedefs within the OpenGL C headers into Ada types so that our description of C functions that take arguments or return a value are accurate. Let’s begin with the basic numeric types:

with Interfaces.C;

package GL.Types is
type Int   is new Interfaces.C.int;      --  GLint
type UInt  is new Interfaces.C.unsigned; --  GLuint

subtype Size is Int range 0 .. Int'Last; --  GLsizei

type Single is new Interfaces.C.C_float; --  GLfloat
type Double is new Interfaces.C.double;  --  GLdouble
end GL.Types;

We use Single as a name for the single-precision floating point type to avoid confusion with Ada's Standard.Float. Moreover, we can apply Ada’s powerful numerical typing system in our definition of GLsize by defining it with a non-negative range. This affords us some extra compile-time and run-time checks without having to add any conditionals – something not possible in C.

The type list above is, of course, shortened for this post, however, two important types are explicitly declared elsewhere:

• GLenum, which is used for parameters that take a well-defined set of values specified within the #define directive in the OpenGL header. Since we want to make the Ada interface safe we will use real enumeration types for that.
• GLboolean, which is an unsigned char representing a boolean value. We do not want to have a custom boolean type in the Ada API because it will not add any value compared to using Ada's Boolean type (unlike e.g. the Int type, which may have a different range than Ada's Integer type).

For these types, we define another package called GL.Low_Level:

with Interfaces.C;

package GL.Low_Level is
type Bool is new Boolean;

subtype Enum is Interfaces.C.unsigned;
private
for Bool use (False => 0, True => 1);
for Bool'Size use Interfaces.C.unsigned_char'Size;
end GL.Low_Level;

We now have a Bool type that we can use for API imports and an Enum type that we will solely use to define the size of our enumeration types. Note that Bool also is an enumeration type, but uses the size of unsigned_char because that is what OpenGL defines for GLboolean.

To show how we can wrap GLenum into actual Ada enumeration types, lets examine glGetError which is defined like this in the C header:

GLenum glGetError(void);

The return value is one of several error codes defined as preprocessor macros in the header. We translate these into an Ada enumeration then wrap the subprogram resulting in the following:

package GL.Errors is
type Error_Code is
(No_Error, Invalid_Enum, Invalid_Value, Invalid_Operation,
Stack_Overflow, Stack_Underflow, Out_Of_Memory,
Invalid_Framebuffer_Operation);

function Error_Flag return Error_Code;
private
for Error_Code use
(No_Error                      => 0,
Invalid_Enum                  => 16#0500#,
Invalid_Value                 => 16#0501#,
Invalid_Operation             => 16#0502#,
Stack_Overflow                => 16#0503#,
Stack_Underflow               => 16#0504#,
Out_Of_Memory                 => 16#0505#,
Invalid_Framebuffer_Operation => 16#0506#);
for Error_Code'Size use Low_Level.Enum'Size;
end GL.Errors;

With the above code we encode the errors defined in the C header as representations for our enumeration values - this way, our safe enumeration type has the exact same memory layout as the defined error codes and maintains compatibility.

We then add the backend for Error_Flag as import to GL.API:

function Get_Error return Errors.Error_Code;
pragma Import (StdCall, Get_Error, "glGetError");

# Error Handling

The OpenGL specification states that whenever an error arises while calling a function of the API, an internal error flag gets set. This flag can then be retrieved with the function glGetError we wrapped above.

It would certainly be nicer, though, if these API calls would raise Ada exceptions instead, but this would mean that in every wrapper to an OpenGL function that may set the error flag we'd need to call Get_Error, and, when the returned flag is something other than No_Error, we'd subsequently need to raise the appropriate exception. Depending on what the user does with the API, this may lead to significant overhead (let us not forget that OpenGL is much more performance-critical than it is safety-critical). In fact, more recent graphics API’s like Vulkan have debugging extensions which require manual tuning to receive error messages - in other words, due to overhead, Vulkan turns off all error checking by default.

So, what we will provide is a feature that auto-raises exceptions whenever the error flag is set, but make it optional. To achieve this, Ada exceptions derived from OpenGL’s error flags need to be defined.

Let’s add the following exception definitions to GL.Errors:

Invalid_Operation_Error             : exception;
Out_Of_Memory_Error                 : exception;
Invalid_Value_Error                 : exception;
Stack_Overflow_Error                : exception;
Stack_Underflow_Error               : exception;
Invalid_Framebuffer_Operation_Error : exception;
Internal_Error                      : exception;

Notice that the exceptions carry the same names as the corresponding enumeration values in the same package. This is not a problem because Ada is intelligent enough to know which one of the two we want depending on context. Also notice the exception Internal_Error which does not correspond to any OpenGL error – we'll see later what we need it for.

Next, we need a procedure that queries the error flag and possibly raises the appropriate exception. Since we will be using such a procedure almost everywhere in our wrapper let’s declare it in the private part of the GL package so that all of GL's child packages have access:

procedure Raise_Exception_On_OpenGL_Error;

And in the body:

procedure Raise_Exception_On_OpenGL_Error is separate;

Here, we tell Ada that this procedure is defined in a separate compilation unit enabling us to provide different implementations depending on whether the user wants automatic exception raising to be enabled or not. Before we continue though let’s set up our project with this in mind:

library project OpenGL is
--  Windowing_System config omitted

type Toggle_Type is ("enabled", "disabled");
Auto_Exceptions : Toggle_Type := external ("Auto_Exceptions", "enabled");

OpenGL_Sources := ("src");
case Auto_Exceptions is
when "enabled" =>
OpenGL_Sources := OpenGL_Sources & "src/auto_exceptions";
when "disabled" =>
OpenGL_Sources := OpenGL_Sources & "src/no_auto_exceptions";
end case;
for Source_Dirs use OpenGL_Sources;

--  packages and other things omitted
end OpenGL;

To conform with the modifications made to the project file we must now create two new directories inside the src folder and place the implementations of our procedure accordingly. GNAT expects the source files to both be named gl-raise_exception_on_openl_error.adb. The implementation of no_auto_exceptions is trivial:

separate (GL)
procedure Raise_Exception_On_OpenGL_Error is
begin
null;
end Raise_Exception_On_OpenGL_Error;


And the one in auto_exceptions looks like this:

with GL.Errors;

separate (GL)
procedure Raise_Exception_On_OpenGL_Error is
begin
case Errors.Error_Flag is
when Errors.Invalid_Operation             => raise Errors.Invalid_Operation_Error;
when Errors.Invalid_Value                 => raise Errors.Invalid_Value_Error;
when Errors.Invalid_Framebuffer_Operation => raise Errors.Invalid_Framebuffer_Operation_Error;
when Errors.Out_Of_Memory                 => raise Errors.Out_Of_Memory_Error;
when Errors.Stack_Overflow                => raise Errors.Stack_Overflow_Error;
when Errors.Stack_Underflow               => raise Errors.Stack_Underflow_Error;
when Errors.Invalid_Enum                  => raise Errors.Internal_Error;
when Errors.No_Error                      => null;
end case;
exception
when Constraint_Error => raise Errors.Internal_Error;
end Raise_Exception_On_OpenGL_Error;

The exception section at the end is used to detect cases where glGetError returns a value we did not know of at the time of implementing this wrapper. Ada would then try to map this value to the Error_Code enumeration, and since the value does not correspond to any value specified in the type definition, the program will raise a Constraint_Error. Of course, OpenGL is very conservative about adding error flags, so this is unlikely to happen, but it is still nice to plan for the future.

# Types Fetching Function Pointers at Runtime

### Part 1: Implementing the "Fetching" Function

As previously noted, many functions from the OpenGL API must be retrieved as a function pointer at run-time instead of linking to them at compile-time. The reason for this once again comes down to the concept of graceful degradation -- if some functionality exists as an extension (especially functions not part of the OpenGL core) but is unimplemented by a target graphics card driver then the programmer will be able to identify or recognize this case when setting the relevant function pointers during execution. Unfortunately though, this creates an extra step which prevents us from simply importing the whole of the API, and, worse still, on Windows no functions being defined on OpenGL 2.0 or later are available for compile time linking, making programmatic queries required.

So then, the question arises: how will these function pointers are to be retrieved? Sadly, this functionality is not available from within the OpenGL API or driver, but instead is provided by platform-specific extensions, or more specifically, the windowing system supporting OpenGL. So, as with exception handling, we will use a procedure with multiple implementations and switch to the appropriate implementation via GPRBuild:

case Windowing_System is
when "windows" => OpenGL_Sources := OpenGL_Sources & "src/windows";
when "x11"     => OpenGL_Sources := OpenGL_Sources & "src/x11";
when "quartz"  => OpenGL_Sources := OpenGL_Sources & "src/mac";
end case;

...and we declare this function in the main source:

function GL.API.Subprogram_Reference (Function_Name : String)
return System.Address;

Then finally, in the windowing-system specific folders, we place the implementation and necessary imports from the windowing system's API. Those imports and the subsequent implementations are not very interesting, so I will not discuss them at length here, but I will show you the implementation for Apple's Mac operating system to give you an idea:

with GL.API.Mac_OS_X;

function GL.API.Subprogram_Reference (Function_Name : String)

-- OSX-specific implementation uses CoreFoundation functions
use GL.API.Mac_OS_X;

package IFC renames Interfaces.C.Strings;

GL_Function_Name_C : IFC.chars_ptr := IFC.New_String (Function_Name);

Symbol_Name : constant CFStringRef :=
CFStringCreateWithCString
cStr     => GL_Function_Name_C,
encoding => kCFStringEncodingASCII);

CFBundleGetFunctionPointerForName
(bundle      => OpenGLFramework,
functionName => Symbol_Name);
begin
CFRelease (Symbol_Name);
IFC.Free (GL_Function_Name_C);
return Result;
end GL.API.Subprogram_Reference;

With the above code in effect, we are now able to retrieve the function pointers, however, we still need to implement the querying machinery to which there are three possible approaches:

• Lazy: When a feature is first needed, its corresponding function pointer is loaded and stored for future use. This approach to loading may produce the least amount of work needed to be done by the resulting application, although, theoretically, it makes performance of a call unpredictable. Since fetching function pointers is fairly trivial operation, however, this is not really a necessarily practical reason against this.
• Eager: At some defined point in time, a call gets issued to a loading function for every function pointer that is supported by OpenGLAda. The Eager approach produces the largest amount of work for the resulting application, but again, since loading is trivial it does not noticeably slow down the application (and, even if it did, it would so during initialization where it is most tolerable).
• Explicit: The user is required to specify which features they want to use and we only load the function pointers related to such features. Explicit loading places the heaviest burden on the user, since they must state which features they will be using.

Overall, the consequences of choosing one of these three possibilities are mild, so we will go with the one easiest to implement, which is the eager approach and is the same one used by many other popular OpenGL libraries.

### Part 2: Autogenerating the Fetching Implementation

For each OpenGL function we import that must be loaded at runtime we need to create three things:

• The definition of an access type describing the function's parameters and return types.
• A global variable having this type to hold the function pointer as soon as it gets loaded.
• A call to a platform-specific function which will return the appropriate function pointer from a DLL or library for storage into our global function pointer.

Implementing these segments for each subprogram is a very repetitive task, which hints to the possibility of automating it. To check whether this is feasible, let’s go over the actual information we need to write in each of these code segments for an imported OpenGL function:

• The name of the C function we import

As you can see, this is almost exactly the same information we would need to write an imported subprogram loaded at compile time! To keep all information about imported OpenGL function centralized, let’s craft a simple specification format where we may list all this information for each subprogram.

Since we need to define Ada subprogram signatures, it seems a good idea to use Ada-like syntax (like GPRBuild does for its project files). After writing a small parser (I will not show details here since that is outside the scope of this post), we can now process a specification file looking like the following. We will discuss the package GL.Objects.Shaders and more about what it does in a bit.

with GL.Errors;
with GL.Types;

spec GL.API is
use GL.Types;

function Get_Error return Errors.Error_Code with Implicit => "glGetError";
procedure Flush with Implicit => "glFlush";

with
end GL.API;

This specification contains two imports we have already created manually and one new import – in this case we use Create_Shader as an example for a subprogram that needs to be loaded via function pointer. We use Ada 2012-like syntax for specifying the target link name with aspects and the import mode. There are two import modes:

• Implicit - meaning that the subprogram will be imported via pragmas. This will give us a subprogram declaration that will be bound to its implementation by the dynamic library loader. So it happens implicitly and we do not actually need to write any code for it. This is what we previously did in our import of glFlush in part one.
• Explicit - meaning that the subprogram will be provided as a function pointer variable. We will need to generate code that assigns a proper value to that variable at runtime in this case.

Processing this specification will generate us the following Ada subunits:

with GL.Errors;
with GL.Types;

private package GL.API is
use GL.Types;

pragma Convention (StdCall, T1);

function Get_Error return Errors.Error_Code;
pragma Import (StdCall, Get_Error, "glGetError");

procedure Flush;
pragma Import (StdCall, Flush, "glFlush");

end package GL.API;

--  ---------------

with System;
private with GL.API.Subprogram_Reference;
use GL.API;

generic
type Function_Reference is private;
function Load (Function_Name : String) return Function_Reference;

function Load (Function_Name : String) return Function_Reference is
function As_Function_Reference is
Target => Function_Reference);

Raw : System.Address := Subprogram_Reference (Function_Name);
begin
return As_Function_Reference (Raw);

begin
end GL.Load_Function_Pointers;

Notice how our implicit subprograms get imported like before, but for the explicit subprogram, a type T1 got created as an access type to the subprogram, and a global variable Create_Shader is defined to be of this type - satisfying all of our needs.

The procedure GL.Load_Function_Pointers contains the code to fill this variable with the right value by obtaining a function pointer using the platform-specific implementation discussed above. The generic load function exists so that additional function pointers can be loaded using this same code.

The only thing left to do is to expose this functionality in the public interface like the example below:

package GL is
--  ... other code

procedure Init;

--  ... other code
end GL;

--  ------

package body GL is
--  ... other code

--  ... other code
end GL;

Of course, we now require the user to explicitly call Init somewhere in their code... You might think that we could automatically execute the loading code at package initialization, but this would not work, because some OpenGL implementations (most prominently the one on Windows) will refuse to load any OpenGL function pointers unless there is a current OpenGL context. This context will only exist after we created an OpenGL surface to render on, which will be done programmatically by the user.

In practice, OpenGLAda includes a binding to the GLFW library as a platform-independent way of creating windows with an OpenGL surface on them, and this binding automatically calls Init whenever a window is made current (i.e. placed in foreground), so that the user does not actually need to worry about it. However, there may be other use-cases that do not employ GLFW, like, for example, creating an OpenGL surface widget with GtkAda. In that case, calling Init manually is still required given our design.

# Memory Management

The OpenGL API enables us to create various objects that reside in GPU memory for things like textures or vertex buffers. Creating such objects gives us an ID (kind of like a memory address) which we can then use to refer to the object instead of a memory address. To avoid memory leaks, we will want to manage these IDs automatically in our Ada wrapper so they are automatically destroyed once the last reference vanishes. Ada’s Controlled types are an ideal candidate for the job. Let's start writing a package GL.Objects to encapsulate the functionality:

package GL.Objects is
use GL.Types;

type GL_Object is abstract tagged private;

procedure Initialize_Id (Object : in out GL_Object);

procedure Clear (Object : in out GL_Object);

function Initialized (Object : GL_Object) return Boolean;

procedure Internal_Create_Id
(Object : GL_Object; Id : out UInt) is abstract;

procedure Internal_Release_Id
(Object : GL_Object; Id : UInt) is abstract;
private
type GL_Object_Reference;
type GL_Object_Reference_Access is access all GL_Object_Reference;

type GL_Object_Reference is record
GL_Id           : UInt;
Reference_Count : Natural;
Is_Owner        : Boolean;
end record;

type GL_Object is abstract new Ada.Finalization.Controlled with record
Reference : GL_Object_Reference_Access := null;
end record;

-- Increases reference count.
overriding procedure Adjust (Object : in out GL_Object);

-- Decreases reference count. Destroys texture when it reaches zero.
overriding procedure Finalize (Object : in out GL_Object);
end GL.Objects;   

GL_Object is our smart pointer here, and GL_Object_Reference is the holder of the object's ID as well as the reference count. We will derive the actual object types (which there are quite a few) from GL_Object so that the base type can be abstract and we can define some subprograms that must be overridden by the child types to enforce the rule. Note that since the class hierarchy is based on GL_Object, all derived types have an identically-typed handle to a GL_Object_Reference object, and thus, our reference-counting is independent of the actual derived type.

The only thing the derived type must declare in order for our automatic memory management to work is how to create and delete the OpenGL object in GPU memory – this is what Internal_Create_Id and Internal_Release_Id in the above segment are for. Because they are abstract, they must be put into the public part of the package even though they should never be called by the user directly.

The core of our smart pointer machinery will be implemented in the Adjust and Finalize procedures provided by Ada.Finalization.Controlled. Since this topic has already been extensively covered in this Ada Gem I am going to skip over the gory implementation details.

So, to create a new OpenGL object the user must call Initialize_Id on a smart pointer which assigns the ID of the newly created object to the smart pointer's backing object. Clear can then later be used to make the smart pointer uninitialized again (but only delete the object if the reference count reaches zero).

To test our system, let's implement a Shader object. Shader objects will hold source code and compiled binaries of GLSL (GL Shading Language) shaders. We will call this package GL.Objects.Shaders in keeping with the rest of the project's structure:

package GL.Objects.Shaders is
pragma Preelaborate;

procedure Set_Source (Subject : Shader; Source : String);

function Compile_Status (Subject : Shader) return Boolean;

function Info_Log (Subject : Shader) return String;

private

overriding
procedure Internal_Create_Id (Object : Shader; Id : out UInt);

overriding
procedure Internal_Release_Id (Object : Shader; Id : UInt);

end GL.Objects.Shaders;

The two overriding procedures are implemented like this:

overriding
procedure Internal_Create_Id (Object : Shader; Id : out UInt) is
begin
Raise_Exception_On_OpenGL_Error;
end Internal_Create_Id;

overriding
procedure Internal_Release_Id (Object : Shader; Id : UInt) is
pragma Unreferenced (Object);
begin
Raise_Exception_On_OpenGL_Error;
end Internal_Release_Id;

Of course, we need to add the subprogram Delete_Shader to our import specification so it will be available in the generated GL.API package. A nice thing is that, in Ada, pointer dereference is often done implicitly so we need not worry whether Create_Shader and Delete_Shader are loaded via function pointers or with the dynamic library loader – the code would look exactly the same in both cases!

# Documentation

One problem we did not yet address is documentation. After all, because we are adding structure and complexity to the OpenGL API, which does not exist in its specification, how is a user supposed to find the wrapper of a certain OpenGL function they want to use?

What we need to do, then, is generate a list where the name of each OpenGL function we wrap is listed and linked to its respective wrapper function in OpenGLAda's API. Of course, we do not want to generate that list manually. Instead, let’s use our import specification again and enrich it with additional information:

   function Get_Error return Errors.Error_Code with
Implicit => "glGetError",  Wrapper => "GL.Errors.Error_Flag";
procedure Flush with
Implicit => "glFlush", Wrapper => "GL.Flush";

With the new "aspect-like" declarations in our template we can enhance our generator with code that writes a Markdown file listing all imported OpenGL functions and linking that to their wrappers. In theory, we could even avoid adding the wrapper information explicitly by analyzing OpenGLAda's code to detect which subprogram wraps the OpenGL function. Tools like ASIS and LibAdaLang would help us with that, but that implementation would be far more work than adding our wrapper references explicitly.

The generated list can be seen on OpenGLAda's website showing all the functions that are actually supported. It is intended to be navigated via search (a.k.a. Ctrl+F).

# Conclusion

By breaking down the complexities of a large C API like OpenGL, we have gone through quite a few improvements that can be done when creating an Ada binding. Some of them were not so obvious and probably not necessary for classifying a binding as thick - for example, auto-loading our function pointers at run-time was simply an artifact of supporting OpenGL and not covered inside the scope of the OpenGL API itself.

We also discovered that when wrapping a C API in Ada we must lift the interface to a higher level since Ada is indeed designed to be a higher-level language than C, and, in this vein, it was natural to add features that are not part of the original API to make it fit more at home in an Ada context.

It might be tempting to write a thin wrapper for your Ada project to avoid overhead, but beware - you will probably still end up writing a thick wrapper. After all, the code around calls that facilitates thinly wrapped functions and the need for data conversions does not simply vanish!

Of course, all this is a lot of work! To give you some numbers: The OpenGLAda repository contains 15,874 lines of Ada code (excluding blanks and comments, tests, and examples) while, for comparison, the C header gl.h (while missing many key features) is only around 3,000 lines.

]]>
For All Properties, There Exists a Proof https://blog.adacore.com/for-all-properties-there-exists-a-proof Mon, 19 Feb 2018 10:15:00 +0000 Yannick Moy https://blog.adacore.com/for-all-properties-there-exists-a-proof

With the recent addition of a Manual Proof capability in SPARK 18, it is worth looking at an example which cannot be proved by automatic provers, to see the options that are available for proving it with SPARK. The following code is such an example, where the postcondition of Do_Nothing cannot be proved with provers CVC4 or Z3, although it is exactly the same as its precondition:

   subtype Index is Integer range 1 .. 10;
type T1 is array (Index) of Integer;
type T2 is array (Index) of T1;

procedure Do_Nothing (Tab : T2) with
Ghost,
Pre  => (for all X in Index => (for some Y in Index => Tab(X)(Y) = X + Y)),
Post => (for all X in Index => (for some Y in Index => Tab(X)(Y) = X + Y));

procedure Do_Nothing (Tab : T2) is null;

The issue is that SMT provers that we use in SPARK like CVC4 and Z3 do not recognize the similarity between the property assumed here (the precondition) and the property to prove (the postcondition). To such a prover, the formula to prove (the Verification Condition or VC) looks like the following in SMTLIB2 format:

(declare-sort integer 0)
(declare-fun to_rep (integer) Int)
(declare-const tab (Array Int (Array Int integer)))
(assert
(forall ((x Int))
(=> (and (<= 1 x) (<= x 10))
(exists ((y Int))
(and (and (<= 1 y) (<= y 10))
(= (to_rep (select (select tab x) y)) (+ x y)))))))
(declare-const x Int)
(assert (<= 1 x))
(assert (<= x 10))
(assert
(forall ((y Int))
(=> (and (<= 1 y) (<= y 10))
(not (= (to_rep (select (select tab x) y)) (+ x y))))))
(check-sat)

We see here some of the encoding from SPARK programming language to SMTLIB2 format: the standard integer type Integer is translated into an abstract type integer, with a suitable projection to_rep from this abstract type to the standard Int type of mathematical integers in SMTLIB2; the array types T1 and T2 are translated into SMTLIB2 Array types. The precondition, which is assumed here, is directly transformed into a universally quantified axiom (starting with "forall"), while the postcondition is negated and joined with the other hypotheses, as an SMT solver will try to deduce an inconsistency to prove the goal by contradiction. So the negated postcondition becomes:

   (for some X in Index => (for all Y in Index => not (Tab(X)(Y) = X + Y)));

The existentially quantified variable X becomes a constant x in the VC, with assertions stating its bounds 1 and 10, and the universal quantification becomes another axiom.

Now it is useful to understand how SMT solvers deal with universally quantified axioms. Obviously, they cannot "try out" every possible value of parameters. Here, the quantified variable ranges over all mathematical integers! And in general, we may quantify over values of abstract types which cannot be enumerated. Instead, SMT solvers find suitable "candidates" for instantiating the axioms. The main technique to find such candidates is called trigger-based instantiation. The SMT solver identifies terms in the quantified axiom that contain the quantified variables, and match them with the so-called "ground" terms in the VC (terms that do not contain quantified or "bound" variables). Here, such a term containing x in the first axiom is (to_rep (select (select tab x) y)), or simply (select tab x), while in the second axiom such a term containing y could be (to_rep (select (select tab x) y)) or (select (select tab x) y). The issue with the VC above is that these do not match any ground term, hence neither CVC4 nor Z3 can prove the VC.

Note that Alt-Ergo is able to prove the VC, using the exact same trigger-based mechanism, because it considers (select tab x) from the second axiom as a ground term in matching. Alt-Ergo uses this term to instantiate the first axiom, which in turn provides the term (select (select tab x) sko_y) [where sko_y is a fresh variable corresponding to the skolemisation of the existentially quantified variable y]. Alt-Ergo then uses this new term to instantiate the second axiom, resulting in a contradiction. So Alt-Ergo can deduce that the VC is unsatisfiable, hence proves the original (non-negated) postcondition.

I am going to consider in the following alternative means to prove such a property, when all SMT provers provided with SPARK fail.

## solution 1 - use an alternative automatic prover

As the property to prove is an exact duplication of a known property in hypothesis, a different kind of provers, called provers by resolution, is a perfect fit. Here, I'm using E prover, but many others are supported by the Why3 platform used in SPARK, and would be as effective. The first step is to install E prover from its website (www.eprover.org) or from its integration in your Linux distro. Then, you need to run the executable why3config to generate a suitable .why3.conf configuration file in your HOME directory, with the necessary information for Why3 to know how to generate VCs for E prover, and how to call it. Currently, GNATprove cannot be called with --prover=eprover, so instead I called directly the underlying Why3 tool and it proves the desired postcondition:

$why3 prove -L /path/to/theories -P Eprover quantarrays.mlw quantarrays.mlw Quantarrays__subprogram_def WP_parameter def : Valid (0.02s) ## solution 2 - prove interactively With SPARK 18 comes the possibility to prove a VC interactively inside the editor GPS. Just right-click on the message about the unproved postcondition and select "Start Manual Proof". Various panels are opened in GPS: Here, the manual proof is really simple. We start by applying axiom H, as the conclusion of this axiom matches the goal to prove, which makes it necessary to prove the conditions for applying axiom H. Then we use the known bounds on X in axioms H1 and H2 to prove the conditions. And we're done! The following snapshot shows that GPS now confirms that the VC has been proved: Note that it is possible to call an automatic prover by its name, like "altergo", "cvc4", or "z3" to prove the VC automatically after the initial application of axiom H. ## solution 3 - use an alternative interactive prover It is also possible to use powerful external interactive provers like Coq or Isabelle. You first need to install these on your machine. GNATprove and GPS are directly integrated with Coq, so that you can right-click on the unproved postcondition, select "Prove Check", then manually enter the switch "--prover=coq" to select Coq prover. GPS will then open CoqIDE on the VC as follows: The proof in Coq is as simple as before. Here is the exact set of tactics to apply to reproduce what we did with manual proof in GPS: Note that the tactic "auto" in Coq proves this VC automatically. ## What to Remember There are many ways forward that are available when automatic provers available with GNATprove fail to prove a property. We already presented in various occasions the use of ghost code. Here we described three other ways: using an alternative automatic prover, proving interactively, and using an alternative interactive prover. [cover image of Kurt Gödel, courtesy of WikiPedia, who demonstrated in fact that no all true properties can be ever proved] ]]> Bitcoin blockchain in Ada: Lady Ada meets Satoshi Nakamoto https://blog.adacore.com/bitcoin-in-ada Thu, 15 Feb 2018 13:00:00 +0000 Johannes Kanig https://blog.adacore.com/bitcoin-in-ada Bitcoin is getting a lot of press recently, but let's be honest, that's mostly because a single bitcoin worth 800 USD in January 2017 was worth almost 20,000 USD in December 2017. However, bitcoin and its underlying blockchain are beautiful technologies that are worth a closer look. Let’s take that look with our Ada hat on! ## So what's the blockchain? “Blockchain” is a general term for a database that’s maintained in a distributed way and is protected against manipulation of the entries; Bitcoin is the first application of the blockchain technology, using it to track transactions of “coins”, which are also called Bitcoins. Conceptually, the Bitcoin blockchain is just a list of transactions. Bitcoin transactions in full generality are quite complex, but as a first approximation, one can think of a transaction as a triple (sender, recipient, amount), so that an initial mental model of the blockchain could look like this: SenderRecipientAmount <Bitcoin address><Bitcoin address>0.003 BTC <Bitcoin address><Bitcoin address>0.032 BTC ......... Other data, such as how many Bitcoins you have, are derived from this simple transaction log and not explicitly stored in the blockchain. Modifying or corrupting this transaction log would allow attackers to appear to have more Bitcoins than they really have, or, allow them to spend money then erase the transaction and spend the same money again. This is why it’s important to protect against manipulation of that database. The list of transactions is not a flat list. Instead, transactions are grouped into blocks. The blockchain is a list of blocks, where each block has a link to the previous block, so that a block represents the full blockchain up to that point in time: Thinking as a programmer, this could be implemented using a linked list where each block header contains a prev pointer. The blockchain is grown by adding new blocks to the end, with each new block pointing to the former previous block, so it makes more sense to use a prev pointer instead of a next pointer. In a regular linked list, prev pointer points directly to the memory used for the previous block. But the uniqueness of the blockchain is that it's a distributed data structure; it's maintained by a network of computers or nodes. Every bitcoin full node has a full copy of the blockchain, but what happens if members of the network don't agree on the contents of some transaction or block? A simple memory corruption or malicious act could result in a client having incorrect data. This is why the blockchain has various checks built-in that guarantee that corruption or manipulation can be detected. ## How does Bitcoin check data integrity? Bitcoin’s internal checks are based on a cryptographic hash function. This is just a fancy name for a function that takes anything as input and spits out a large number as output, with the following properties: • The output of the function varies greatly and unpredictably even with tiny variations of the input; • It is extremely hard to deduce an input that produces some specific output number, other than by using brute force; that is, by computing the function again and again for a large number of inputs until one finds the input that produces the desired output. The hash function used in Bitcoin is called SHA256. It produces a 256-bit number as output, usually represented as 64 hexadecimal digits. Collisions (different input data that produces the same output hash value) are theoretically possible, but the output space is so big that collisions on actual data are considered extremely unlikely, in fact practically impossible. The idea behind the first check of Bitcoin's data integrity is to replace a raw pointer to a memory region with a “safe pointer” that can, by construction, only point to data that hasn’t been tampered with. The trick is to use the hash value of the data in the block as the “pointer” to the data. So instead of a raw pointer, one stores the hash of the previous block as prev pointer: Here, I’ve abbreviated the 256-bit hash values by their first two and last four hex digits – by design, Bitcoin block hashes always start with a certain number of leading zeroes. The first block contains a "null pointer" in the form of an all zero hash. Given a hash value, it is infeasible to compute the data associated with it, so one can't really "follow" a hash like one can follow a pointer to get to the real data. Therefore, some sort of table is needed to store the data associated with the hash value. Now what have we gained? The structure can no longer easily be modified. If someone modifies any block, its hash value changes, and all existing pointers to it are invalidated (because they contain the wrong hash value). If, for example, the following block is updated to contain the new prev pointer (i.e., hash), its own hash value changes as well. The end result is that the whole data structure needs to be completely rewritten even for small changes (following prev pointers in reverse order starting from the change). In fact such a rewrite never occurs in Bitcoin, so one ends up with an immutable chain of blocks. However, one needs to check (for example when receiving blocks from another node in the network) that the block pointed to really has the expected hash. ## Block data structure in Ada To make the above explanations more concrete, let's look at some Ada code (you may also want to have bitcoin documentation available). A bitcoin block is composed of the actual block contents (the list of transactions of the block) and a block header. The entire type definition of the block looks like this (you can find all code in this post plus some supporting code in this github repository):  type Block_Header is record Version : Uint_32; Prev_Block : Uint_256; Merkle_Root : Uint_256; Timestamp : Uint_32; Bits : Uint_32; Nonce : Uint_32; end record; type Transaction_Array is array (Integer range <>) of Uint_256; type Block_Type (Num_Transactions : Integer) is record Header : Block_Header; Transactions : Transaction_Array (1 .. Num_Transactions); end record; As discussed, a block is simply the list of transactions plus the block header which contains additional information. With respect to the fields for the block header, for this blog post you only need to understand two fields: • Prev_Block a 256-bit hash value for the previous block (this is the prev pointer I mentioned before) • Merkle_Root a 256-bit hash value which summarizes the contents of the block and guarantees that when the contents change, the block header changes as well. I will explain how it is computed later in this post. The only piece of information that’s missing is that Bitcoin usually uses the SHA256 hash function twice to compute a hash. So instead of just computing SHA256(data), usually SHA256(SHA256(data)) is computed. One can write such a double hash function in Ada as follows, using the GNAT.SHA256 library and String as a type for a data buffer (we assume a little-endian architecture throughout the document, but you can use the GNAT compiler’s Scalar_Storage_Order feature to make this code portable): with GNAT.SHA256; use GNAT.SHA256; function Double_Hash (S : String) return Uint_256 is D : Binary_Message_Digest := Digest (S); T : String (1 .. 32); for T'Address use D'Address; D2 : constant Binary_Message_Digest := Digest (T); function To_Uint_256 is new Ada.Unchecked_Conversion (Source => Binary_Message_Digest, Target => Uint_256); begin return To_Uint_256 (D2); end Double_Hash;  The hash of a block is simply the hash of its block header. This can be expressed in Ada as follows (assuming that the size in bits of the block header, Block_Header’Size in Ada, is a multiple of 8):  function Block_Hash (B : Block_Type) return Uint_256 is S : String (1 .. Block_Header'Size / 8); for S'Address use B.Header'Address; begin return Double_Hash (S); end Block_Hash; Now we have everything we need to check the integrity of the outermost layer of the blockchain. We simply iterate over all blocks and check that the previous block indeed has the hash used to point to it: declare Cur : String := "00000000000000000044e859a307b60d66ae586528fcc6d4df8a7c3eff132456"; S : String (1 ..64); begin loop declare B : constant Block_Type := Get_Block (Cur); begin S := Uint_256_Hex (Block_Hash (B)); Put_Line ("checking block hash = " & S); if not (Same_Hash (S,Cur)) then Ada.Text_IO.Put_Line ("found block hash mismatch"); end if; Cur := Uint_256_Hex (B.Prev_Block); end; end loop; end; A few explanations: the Cur string contains the hash of the current block as a hexadecimal string. At each iteration, we fetch the block with this hash (details in the next paragraph) and compute the actual hash of the block using the Block_Hash function. If everything matches, we set Cur to the contents of the Prev_Block field. Uint_256_Hex is the function to convert a hash value in memory to its hexadecimal representation for display. One last step is to get the actual blockchain data. The size of the blockchain is now 150GB and counting, so this is actually not so straightforward! For this blog post, I added 12 blocks in JSON format to the github repository, making it self-contained. The Get_Block function reads a file with the same name as the block hash to obtain the data, starting at a hardcoded block with the hash mentioned in the code. If you want to verify the whole blockchain using the above code, you have to either query the data using some website such as blockchain.info, or download the blockchain on your computer, for example using the Bitcoin Core client, and update Get_Block accordingly. ## How to compute the Merkle Root Hash So far, we were able to verify the proper chaining of the blockchain, but what about the contents of the block? The objective is now to come up with the Merkle root hash mentioned earlier, which is supposed to "summarize" the block contents: that is, it should change for any slight change of the input. First, each transaction is again identified by its hash, similar to how blocks are identified. So now we need to compute a single hash value from the list of hashes for the transactions of the block. Bitcoin uses a hash function which combines two hashes into a single hash:  function SHA256Pair (U1, U2 : Uint_256) return Uint_256 is type A is array (1 .. 2) of Uint_256; X : A := (U1, U2); S : String (1 .. X'Size / 8); for S'Address use X'Address; begin return Double_Hash (S); end SHA256Pair;  Basically, the two numbers are put side-by-side in memory and the result is hashed using the double hash function. Now we could just iterate over the list of transaction hashes, using this combining function to come up with a single value. But it turns out Bitcoin does it a bit differently; hashes are combined using a scheme that's called a Merkle tree: One can imagine the transactions (T1 to T6 in the example) be stored at the leaves of a binary tree, where each inner node carries a hash which is the combination of the two child hashes. For example, H7 is computed from H1 and H2. The root node carries the "Merkle root hash", which in this way summarizes all transactions. However, this image of a tree is just that - an image to show the order of hash computations that need to be done to compute the Merkle root hash. There is no actual tree stored in memory. There is one peculiarity in the way Bitcoin computes the Merkle hash: when a row has an odd number of elements, the last element is combined with itself to compute the parent hash. You can see this in the picture, where H9 is used twice to compute H11. The Ada code for this is quite straightforward:  function Merkle_Computation (Tx : Transaction_Array) return Uint_256 is Max : Integer := (if Tx'Length rem 2 = 0 then Tx'Length else Tx'Length + 1); Copy : Transaction_Array (1 .. Max); begin if Tx'Length = 1 then return Tx (Tx'First); end if; if Tx'Length = 0 then raise Program_Error; end if; Copy (1 .. Tx'Length) := Tx; if (Max /= Tx'Length) then Copy (Max) := Tx (Tx'Last); end if; loop for I in 1 .. Max / 2 loop Copy (I) := SHA256Pair (Copy (2 * I - 1), Copy (2 *I )); end loop; if Max = 2 then return Copy (1); end if; Max := Max / 2; if Max rem 2 /= 0 then Copy (Max + 1) := Copy (Max); Max := Max + 1; end if; end loop; end Merkle_Computation;  Note that despite the name, the input array only contains transaction hashes and not actual transactions. A copy of the input array is created at the beginning; after each iteration of the loop in the code, it contains one level of the Merkle tree. Both before and inside the loop, if statements check for the edge case of combining an odd number of hashes at a given level. We can now update our checking code to also check for the correctness of the Merkle root hash for each checked block. You can check out the whole code from this repository; the branch “blogpost_1” will stay there to point to the code as shown here. Why does Bitcoin compute the hash of the transactions in this way? Because it allows for a more efficient way to prove to someone that a certain transaction is in the blockchain. Suppose you want to show someone that you sent her the required amount of Bitcoin to buy some product. The person could, of course, download the entire block you indicate and check for themselves, but that’s inefficient. Instead, you could present them with the chain of hashes that leads to the root hash of the block. If the transaction hashes were combined linearly, you would still have to show them the entire list of transactions that come after yours in the block. But with the Merkle hash, you can present them with a “Merkle proof”: that is, just the hashes required to compute the path from your transaction to the Merkle root. In your example, if your transaction is T3, it's enough to also provide H4, H7 and H11: the other person can compute the Merkle root hash from that and compare it with the “official” Merkle root hash of that block. When I first saw this explanation, I was puzzled why an attacker couldn’t modify transaction T3 to T3b and then “invent” the hashes H4b, H7b and H11b so that the Merkle root hash H12 is unchanged. But the cryptographic nature of the hash function prevents this: today, there is no known attack against the hash function SHA256 used in Bitcoin that would allow inventing such input values (but for the weaker hash function SHA1 such collisions have been found). ## Wrap-Up In this blog post I have shown Ada code that can be used to verify the data integrity of blocks from the Bitcoin blockchain. I was able to check the block and Merkle root hashes for all the blocks in the blockchain in a few hours on my computer, though most of the time was spent in Input/Output to read the data in. There are many more rules that make a block valid, most of them related to transactions. I hope to cover some of them in later blog posts. ]]> The Road to a Thick OpenGL Binding for Ada: Part 1 https://blog.adacore.com/the-road-to-a-thick-opengl-binding-for-ada Mon, 05 Feb 2018 15:59:00 +0000 Felix Krause https://blog.adacore.com/the-road-to-a-thick-opengl-binding-for-ada This blog post is part one of a tutorial based on the OpenGLAda project and will cover some the background of the OpenGL API and the basic steps involved in importing platform-dependent C functions. # Motivation Ada was designed by its onset in the late 70’s to be highly compatible with other languages - for example, there are currently native facilities for directly using libraries from C, FORTRAN, COBOL, C++, and even Java. However, there is still a process (although automate-able to a certain extent) that must be followed to safely and effectively import an API or create what we will refer to here as a binding. Additionally, foreign APIs may not be the most efficient or user-friendly for direct use in Ada, and so it is often considered useful to go above and beyond making a simple or thin binding and instead craft a small custom library (or thick binding) above the original API to solidify and greatly simplify its use within the Ada language. In this blog post I will describe the design decisions and architecture of OpenGLAda - a custom thick binding to the OpenGL API for Ada, and, in the process, I hope to provide ideas and techniques that may inspire others to contribute their own bindings for similar libraries. Below are some examples based on the classic OpenGL Superbible of what is possible using the OpenGLAda binding and whose complete source can be found on my Github repo for OpenGLAda here along with instructions for setting up an Ada environment: # Background OpenGL, created in 1991 by Silicon Graphics, has had a long history as an industry standard for rendering 3D vector graphics - growing through numerous revisions (currently at version 4.6) both adding new features and deprecating or removing others. As a result, the once simple API has become more complex and difficult to wield at times. Despite this and even with the competition of Microsoft’s DirectX and the creation of new APIs (like Vulkan), OpenGL still remains a big player in the Linux, Mac, and free-software world. Unlike a typical C library, OpenGL has hundreds (maybe even thousands) of implementations, usually provided by graphics hardware vendors. While the OpenGL API itself is considered platform-independent, making use of it does depends heavily on the target platform's graphics and windowing systems. This is due to the fact that rendering requires a so-called OpenGL context consisting of a drawing area on the screen and all associated data needed for rendering. For this reason, there exist multiple glue APIs that enable using OpenGL in conjunction with several windowing systems. # Design Challenges A concept that proliferates the design of OpenGL is graceful degradation - meaning that if some feature or function is unavailable on a target platform the client software may supply a workaround or simply skip the part of the rendering process in which the feature is required. This makes it necessary to query for existing features during run-time. Additionally, the code for querying OpenGL features is not part of the OpenGL API itself and must be provided by us and defined separately for each platform we plan to support. These properties pose the following challenges for our Ada binding: 1. It must include some platform-dependent code, ideally hiding this from the user to enable platform-independent usage. 2. It must access OpenGL features without directly linking to them so that missing features can be handled inside the application. # First Steps Note: I started working on OpenGLAda in 2012 so it only uses the features of the Ada 2005 language level. Some code shown here could be written in a more succinct way with the added constructs in Ada 2012 (most notably aspects and expression functions). To get started on our binding we need to translate subprogram definitions from the standard OpenGL C header into Ada. Since we are writing a thick binding and are going above and beyond directly using the original C function, these API imports should be invisible to the user. Thus, we will define a set of private packages such as GL.API to house all of these imports. A private package can only be used by the immediate parent package and its children, making it invisible for a user of the library. The public package GL and its public child packages will provide the public interface. To translate a C subprogram declaration to Ada, we need to map all C types it uses into equivalent Ada types then essentially change the syntax from C to Ada. For the first import, we choose the following subprogram: void glFlush(); This is a command used to tell OpenGL to execute commands currently stored in internal buffers. It is a very common command and thus is placed directly in the top-level package of the public interface. Since the command has no parameters and returns no values, there are no types involved so we don’t need to care about them for now. Our Ada code looks like this: package GL is procedure Flush; end GL; private package GL.API is procedure Flush; pragma Import (Convention => C, Entity => Flush, External_Name => "glFlush"); end GL.API; package body GL is procedure Flush is begin API.Flush; end Flush; end GL; Instead of providing an implementation of GL.API.Flush in a package body, we use the pragma Import to tell the Ada compiler that we are importing this subprogram from another library. The first parameter is the calling convention, which defines low-level details about how a subprogram call is to be translated into machine code. It is vital that the caller and the callee agree on the same calling convention; a mistake at this point is hard to detect and, in the worst case, may lead to memory corruption during run-time. Note that when defining the implementation of the public subprogram GL.Flush, we cannot use a renames clause like we typically would, because our imported backend subprogram is within a private package. Now, the interesting part: how do we link to the appropriate OpenGL implementation according to the system we are targeting? Not only are there multiple implementations, but their link names also differ. The solution is to use the GPRBuild tool and define a scenario variable to select the correct linker flags: library project OpenGL is type Windowing_System_Type is ("windows", -- Microsoft Windows "x11", -- X Window System (primarily used on Linux) "quartz"); -- Quartz Compositor (the macOS window manager) Windowing_System : Windowing_System_Type := external ("Windowing_System"); for Languages use ("ada"); for Library_Name use "OpenGLAda"; for Source_Dirs use ("src"); package Compiler is for Default_Switches ("ada") use ("-gnat05"); end Compiler; package Linker is case Windowing_System is when "windows" => for Linker_Options use ("-lOpenGL32"); when "x11" => for Linker_Options use ("-lGL"); when "quartz" => for Linker_Options use ("-Wl,-framework,OpenGL"); end case; end Linker; end OpenGL; We will need other distinctions based on the windowing system later, and thus we name the scenario variable Windowing_System accordingly, although, at this point, it would also be sensible to distinguish just the operating system instead. We use Linker_Options instead of Default_Switches in the linker to tell GPRBuild what options we need when linking the final executable. As you can see, the library we link against is called OpenGL32 on Windows and GL on Linux. On MacOS, there is the concept of frameworks which are somewhat more sophisticated software libraries. On the gcc command line, they can be given with "-framework <name>", which gcc hands over to the linker. However, this does not work easily with GPRBuild unless we use the "-Wl,option" flag, whose operation is defined as: Pass "option" as an option to the linker. If option contains commas, it is split into multiple options at the commas. You can use this syntax to pass an argument to the option. At this point, we have almost successfully wrapped our first OpenGL subprogram. However, there is a nasty little detail we overlooked: Windows APIs use a calling convention different than the standard C one. One usually only needs to care about this when linking against the Win32 API, however, OpenGL is thought to be part of the Windows API as we can see in the OpenGL C header: GLAPI void APIENTRY glFlush (void); ... and by digging through the Windows version of this header we then find somewhere, wrapped in some #ifdef's, this line: #define APIENTRY __stdcall This means our target C function has the calling convention stdcall, which is only used on Windows. Thankfully, GNAT supports this calling convention, and moreover, for every system that is not Windows defines it as synonym for the C calling convention. Thus, we can rewrite our import: procedure Flush; pragma Import (Convention => Stdcall, Entity => Flush, External_Name => "glFlush"); With the above code, our first wrapper subprogram is ready. ### Stay tuned for part two where we will cover a basic type system for interfacing with C, error handling, memory management, and more! ## Part two of this article can be found here! ]]> AdaCore at FOSDEM 2018 https://blog.adacore.com/adacore-at-fodsem-2018 Thu, 18 Jan 2018 14:55:02 +0000 Pierre-Marie de Rodat https://blog.adacore.com/adacore-at-fodsem-2018 Every year, free and open source enthusiasts gather at Brussels (Belgium) for two days of FLOSS-related conferences. FOSDEM organizers setup several “developer rooms”, which are venues that host talks on specific topics. This year, the event will happen on the 3rd and 4th of February (Saturday and Sunday) and there is a room dedicated to the Ada programming language. Just like last year and the year before, several AdaCore engineers will be there. We have five talks scheduled: In the Ada devroom: In the Embedded, mobile and automotive devroom: In the Source Code Analysis devroom: Note also in the Embedded, mobile and automotive devroom that the talk from Alexander Senier about the work they are doing at Componolit, which uses SPARK and Genode to bring trust to the Android platform. If you happen to be in the area, please come and say hi! ]]> Leveraging Ada Run-Time Checks with Fuzz Testing in AFL https://blog.adacore.com/running-american-fuzzy-lop-on-your-ada-code Tue, 19 Dec 2017 09:36:08 +0000 Lionel Matias https://blog.adacore.com/running-american-fuzzy-lop-on-your-ada-code Fuzzing is a very popular bug finding method. The concept, very simply, is to continuously inject random (garbage) data as input of a software component, and wait for it to crash. Google's Project Zero team made it one of their major vulnerability-finding tools (at Google scale). It is very efficient at robust-testing file format parsers, antivirus software, internet browsers, javascript interpreters, font face libraries, system calls, file systems, databases, web servers, DNS servers... When Heartbleed came out, people found out that it was, indeed, easy to find. Google even launched a free service to fuzz at scale widely used open-source libraries, and Microsoft created Project Springfield, a commercial service to fuzz your application (at scale also, in the cloud). Writing robustness tests can be tedious, and we - as developers - are usually bad at it. But when your application is on an open network, or just user-facing, or you have thousands of appliance in the wild, that might face problems (disk, network, cosmic rays :-)) that you won't see in your test lab, you might want to double-, triple-check that your parsers, deserializers, decoders are as robust as possible... In my experience, fuzzing causes so many unexpected crashes in so many software parts, that it's not unusual to spend more time doing crash triage than preparing a fuzzing session. It is a great verification tool to complement structured testing, static analysis and code review. Ada is pretty interesting for fuzzing, since all the runtime checks (the ones your compiler couldn't enforce statically) and all the defensive code you've added (through pre-/post-conditions, asserts, ...) can be leveraged as fuzzing targets. Here's a recipe to use American Fuzzy Lop on your Ada code. # American Fuzzy Lop AFL is a fuzzer from Michał Zalewski (lcamtuf), from the Google security team. It has an impressive trophy case of bugs and vulnerabilities found in dozens of open-source libraries or tools. You can even see for yourself how efficient guided fuzzing is in the now classic 'pulling jpegs out of thin air' demonstration on the lcamtuf blog. I invite you to read the technical description of the tool to get a precise idea of the innards of AFL. Installation instructions are covered in the Quick Start Guide, and they can be summed as: 1. Get the latest source 2. Build it (make) 3. Make your program ready for fuzzing (here you have to work a bit) 4. Start fuzzing... There are two main parts of the tool: • afl-clang/afl-gcc to instrument your binary • and afl-fuzz that runs your binary and uses the instrumentation to guide the fuzzing session. ### Instrumentation afl-clang / afl-gcc compiles your code and adds a simple instrumentation around branch instructions. The instrumentation is similar to gcov or profiling instrumentation but it targets basic blocks. In the clang world, afl-clang-fast uses a plug-in to add the instrumentation cleanly (the compiler knows about all basic blocks, and it's very easy to add some code at the start of a basic block in clang). In the gcc world the tool only provides a hacky solution. The way it works is that instead of calling your GCC of predilection, you call afl-gcc. afl-gcc will then call your GCC to output the assembly code generated from your code. To simplify, afl-gcc patches every jump instruction and every label (jump destination) to append an instrumentation block. It then calls your assembler to finish the compilation job. Since it is a pass on assembly code generated from GCC it can be used to fuzz Ada code compiled with GNAT (since GNAT is based on GCC). In the gprbuild world this means calling gprbuild with the --compiler-subst=lang,tool option (see gprbuild manual). Note : afl-gcc will override compilation options to force -O3 -funroll-loops. The reason behind this is that the authors of AFL noticed that those optimization options helped with the coverage instrumentation (unrolling loops will add new 'jump' instructions). With some codebases there can appear a problem with the 'rep ret' instruction. For obscure reasons gcc sometimes insert a 'rep ret' instruction instead of a ‘ret’ (return) instruction. Some info on the gcc mailing list archives and in more detail if you dare on a dedicated website call repzret.org. When AFL inserts its instrumentation code, the 'rep ret' instruction is not correct anymore ('as' complains). Since 'rep ret' is exactly the same instruction (except a bit slower on some AMD arch) as ‘ret’, you can add a step in afl-as (the assembly patching module) to patch the (already patched) assembly code: add the following code at line 269 in afl-as.c (on 2.51b or 2.52b versions):  if (!strncmp(line, "\trep ret", 8)) { SAYF("[LMA patch] afl-as : replace 'rep ret' with (only) 'ret'\n"); fputs("\tret\n", outf); continue; } ... and then recompile AFL. It then works fine, and prints a specific message whenever it encounters the problematic case. I didn't need this workaround for the example programs I chose for this post (you probably won't need it), but it can happen, so here you go... Though a bit hacky, going through assembly and sed-patching it seems the only way to do this on gcc, for now. It's obviously not available on any other arch (power, arm) as such, as afl-as inserts an x86-specifc payload. Someone wrote a gcc plug-in once and it would need some love to be ported to a gcc-6 (recent GNAT) or -8 series (future GNAT). The plug-in approach would also allow to do in-process fuzzing, speed-up the fuzzing process, and ease the fuzzing of programs with a large initialization/set-up time. When you don't have the source code or changing your build chain would be too hard, the afl-fuzz manual mentions a Qemu-based option. I haven't tried it though. ### The test-case generator It takes a bunch of valid inputs to your application, and implements a wide variety of random mutations, runs your application with them and then uses the inserted instrumentation to guide itself to new code paths and avoid staying too much on paths that already crash. AFL looks for crashes. It is expecting a call to abort() (SIGABRT). Its job is to try and crash your software, its search target is "a new unique crash". It's not very common to get a core dump (SIGSEGV/SIGABRT) in Ada with GNAT, even following an uncaught top-level exception. You'll have to help the fuzzer and provoke core dumps on errors you want to catch. A top-level exception by itself won't do it. In the GNAT world you can dump core using the Core_Dump procedure in the GNAT-specific package GNAT.Exception_Actions. What I usually do is let all exceptions bubble up to a top-level exception handler and filter by name, and only crash/abort on the exceptions I'm interested in. And if the bug you're trying to find with fuzzing doesn't crash your application, make it a crashing bug. With all that said, let’s find some open-source libraries to fuzz. # Fuzzing Zip-Ada Zip-Ada is a nice pure-Ada library to work with zip archives. It can open, extract, compress, decompress most of the possible kinds of zip files. It even has implemented recently LZMA compression. It's 100% Ada, portable, quite readable and simple to use (drop the source, use the gpr file, look up the examples and you're set). And it's quite efficient (say my own informal benchmarks). Anyway it's a cool project to contribute to but I'm no compression wizard. Instead, let's try and fuzz it. Since it's a library that can be given arbitrary files, maybe of dubious source, it needs to be robust. I got the source from sourceforge of version 52 (or if you prefer, on github), uncompressed it and found the gprbuild file. Conveniently Zip-Ada comes with a debug mode that enables all possible runtime checks from GNAT, including -gnatVa, -gnato. The zipada.gpr file also references a 'pragma file' (through -gnatec=debug.pra) that contains a 'pragma Initialize_Scalars;' directive so everything is OK on the build side. Then we need a very simple test program that takes a file name as a command-line argument, then drives the library from there. File parsers are the juiciest targets, so let's read and parse a file: we'll open and extract a zip file. For a first program what we're looking for is procedure Extract in the Unzip package: -- Extract all files from an archive (from) procedure Extract (From : String; Options : Option_set := No_Option; Password : String := ""; File_System_Routines : FS_Routines_Type := Null_Routines) Just give it a file name and it will (try to) parse it as an archive and extract all the files from the archive. We also need to give AFL what it needs (abort() / core dump) so let's add a top-level exception block that will do that, unconditionally (at first) on any exception. The example program looks like: with UnZip; use UnZip; with Ada.Command_Line; with GNAT.Exception_Actions; with Ada.Exceptions; with Ada.Text_IO; use Ada.Text_IO; procedure Test_Extract is begin Extract (From => Ada.Command_Line.Argument (1), Options => (Test_Only => True, others => False), Password => "", File_System_Routines => Null_routines); exception when Occurence : others => Put_Line ("exception occured [" & Ada.Exceptions.Exception_Name (Occurence) & "] [" & Ada.Exceptions.Exception_Message (Occurence) & "] [" & Ada.Exceptions.Exception_Information (Occurence) & "]"); GNAT.Exception_Actions.Core_Dump (Occurence); end Test_Extract; And to have it compile, we add it to the list of main programs in the zipada.gpr file. Then let's build: gprbuild --compiler-subst=Ada,/home/lionel/afl/afl-2.51b/afl-gcc -p -P zipada.gpr -Xmode=debug We get a classic gprbuild display, with some additional lines: ... afl-gcc -c -gnat05 -O2 -gnatp -gnatn -funroll-loops -fpeel-loops -funswitch-loops -ftracer -fweb -frename-registers -fpredictive-commoning -fgcse-after-reload -ftree-vectorize -fipa-cp-clone -ffunction-sections -gnatec../za_elim.pra zipada.adb afl-cc 2.51b by <lcamtuf@google.com> afl-as 2.51b by <lcamtuf@google.com> [+] Instrumented 434 locations (64-bit, non-hardened mode, ratio 100%). afl-gcc -c -gnat05 -O2 -gnatp -gnatn -funroll-loops -fpeel-loops -funswitch-loops -ftracer -fweb -frename-registers -fpredictive-commoning -fgcse-after-reload -ftree-vectorize -fipa-cp-clone -ffunction-sections -gnatec../za_elim.pra comp_zip.adb afl-cc 2.51b by <lcamtuf@google.com> afl-as 2.51b by <lcamtuf@google.com> [+] Instrumented 45 locations (64-bit, non-hardened mode, ratio 100%). ... The 2 additional afl-gcc and afl-as steps show up along with a counter of instrumented locations in the assembly code for each unit. So, some instrumentation was inserted. Fuzzers are bad with checksums (http://moyix.blogspot.fr/2016/07/fuzzing-with-afl-is-an-art.html is an interesting dive into what can block afl-fuzz and what can be done, and John Regehr had a blog post on what AFL is bad at). For example, there’s no way for a fuzzing tool to go through a checksum test: it would need to generate only test cases that have a matching checksum. So, to make sure we get somewhere, I removed all checksum tests. There was one for zip CRC. Another one for zip passwords, for similar reasons. After I commented out those tests, I recompiled the test program. Then we’ll need to build a fuzzing environnement: mkdir fuzzing-session mkdir fuzzing-session/input mkdir fuzzing-session/output We also need to bootstrap the fuzzer with an initial corpus that doesn't crash. If there's a test suite, put the correct files in input/. Then afl-fuzz can (finally) be launched: AFL_I_DONT_CARE_ABOUT_MISSING_CRASHES=ON \ /home/lionel/afl/afl-2.51b/afl-fuzz -m 1024 -i input -o output ../test_extract @@ -i dir - input directory with test cases -o dir - output directory for fuzzer findings -m megs - memory limit for child process (50 MB) @@ to tell afl to put the input file as a command line argument. By default afl will write to the program's stdin. The AFL_I_DONT_CARE_ABOUT_MISSING_CRASHES=ON prelude is to silence a warning from afl-fuzz about how your system handles core dumps (see the man page for core). For afl-fuzz it's a problem because whatever is done to handle core dumps on your system might take some time and afl-fuzz will think the program timed out (although it crashed). For you it can also be a problem : It’s possible (see : some linux distros) that it’d instruct your system to do something (send a UI notification, fill your /var/log/messages, send a crash report e-mail to your sysadmin, …) with core dumps automatically (and you might not care). Maybe check first with your sysadmin… If you’re root on your machine, follow afl-fuzz’s advice and change your /proc/sys/kernel/core_pattern to something sensitive. Let’s go: In less than 2 minutes, afl-fuzz finds several crashes. While it says they’re “unique”, they in fact trigger the same 2 or 3 exceptions. After 3 hours, it “converges” to a list of crashes, and letting it run for 3 days doesn’t bring another one. It got a string of CONSTRAINT_ERRORs: • CONSTRAINT_ERROR : unzip.adb:269 range check failed • CONSTRAINT_ERROR : zip.adb:535 range check failed • CONSTRAINT_ERROR : zip.adb:561 range check failed • CONSTRAINT_ERROR : zip-headers.adb:240 range check failed • CONSTRAINT_ERROR : unzip-decompress.adb:650 range check failed • CONSTRAINT_ERROR : unzip-decompress.adb:712 index check failed • CONSTRAINT_ERROR : unzip-decompress.adb:1384 access check failed • CONSTRAINT_ERROR : unzip-decompress.adb:1431 access check failed • CONSTRAINT_ERROR : unzip-decompress.adb:1648 access check failed I sent those and the reproducers to Gautier de Montmollin (Zip-Ada's maintainer). He corrected those quickly (revisions 587 up to 599). Most of those errors now are raised as Zip-Ada-specific exceptions. He also decided to rationalize the list of raised exceptions that could (for legitimate reasons) be raised from the Zip-Ada decoding code. It also got some ADA.IO_EXCEPTIONS.END_ERROR: • ADA.IO_EXCEPTIONS.END_ERROR : zip.adb:894 • ADA.IO_EXCEPTIONS.END_ERROR : s-ststop.adb:284 instantiated at s-ststop.adb:402 I redid another fuzzing session after all the corrections and improvements confirming the list of exceptions. This wasn’t a lot of work (for me), mostly using the cycles on my machine that I didn’t use, and I got a nice thanks for contributing :-). # Fuzzing AdaYaml AdaYaml is a library to parse YAML files in Ada. Let’s start by cloning the github repository (the one before all the corrections). For those not familiar to git (here's a tutorial) : git clone https://github.com/yaml/AdaYaml.git git checkout 5616697b12696fd3dcb1fc01a453a592a125d6dd Then the source code of the version I tested should be in the AdaYaml folder. If you don't want anything to do with git, there's a feature on github to download a Zip archive of a version of a repository. AdaYaml will ask for a bit more work to fuzz: we need to create a simple example program, then add some compilation options to the GPR files (-gnatVa, -gnato) and to add a pragma configuration file to set pragma Initialize_Scalars. This last option, combined with -gnatVa helps surface accesses to uninitialized variables (if you don't know the option : https://gcc.gnu.org/onlinedocs/gcc-4.6.3/gnat_rm/Pragma-Initialize_005fScalars.html and http://www.adacore.com/uploads/technical-papers/rtchecks.pdf). All those options to make sure we catch the most problems possible with runtime checks. The example program looks like: with Utils; with Ada.Text_IO; with Ada.Command_Line; with GNAT.Exception_Actions; with Ada.Exceptions; with Yaml.Dom; with Yaml.Dom.Vectors; with Yaml.Dom.Loading; with Yaml.Dom.Dumping; with Yaml.Events.Queue; procedure Yaml_Test is S : constant String := Utils.File_Content (Ada.Command_Line.Argument (1)); begin Ada.Text_IO.Put_Line (S); declare V : constant Yaml.Dom.Vectors.Vector := Yaml.Dom.Loading.From_String (S); E : constant Yaml.Events.Queue.Reference := Yaml.Dom.Dumping.To_Event_Queue (V); pragma Unreferenced (E); begin null; end; exception when Occurence : others => Ada.Text_IO.Put_Line (Ada.Exceptions.Exception_Information (Occurence)); GNAT.Exception_Actions.Core_Dump (Occurence); end Yaml_Test; The program just reads a file and parses it, transforms it into a vector of DOM objects, then transforms those back to a list of events (see API docs). The YAML reference spec may help explain a bit what's going on here. Using the following diagram, and for those well-versed in YAML: • the V variable (of our test program) is a "Representation" generated via the Parse -> Compose path • the E variable is an "Event Tree" generated from V via "Serialize" (so, going back down to a lower-level representation from the DOM tree). For this specific fuzzing test, the idea is not to stop at the first stage of parsing but also to go a bit through the data that was decoded, and do something with it (here we stop short of a round-trip to text, we just go back to an Event Tree). Sometimes a parser faced with incoherent input will keep on going (fail silently) and won't fill (initialize) some fields. The GPR files to patch are yaml.gpr and the parser_tools.gpr subproject. The first fuzzing session triggers “expected” exceptions from the parser: • YAML.PARSER_ERROR • YAML.COMPOSER_ERROR • LEXER.LEXER_ERROR • YAML.STREAM_ERROR (as it turns out, this one is also unexpected... more on this one later) Which should happen with malformed input. So to get unexpected crashes and only those, let’s filter them in the top-level exception handler. exception when Occurence : others => declare N : constant String := Ada.Exceptions.Exception_Name (Occurence); begin Ada.Text_IO.Put_Line (Ada.Exceptions.Exception_Information (Occurence)); if N = "YAML.PARSER_ERROR" or else N = "LEXER.LEXER_ERROR" or else N = "YAML.STREAM_ERROR" or else N = "YAML.COMPOSER_ERROR" then null; else GNAT.Exception_Actions.Core_Dump (Occurence); end if; end Yaml_Test; Then, I recompiled, used some YAML example files as a startup corpus, and started fuzzing. After 4 minutes 30 seconds, the first crashes appeared. I let it run for hours, then a day and found a list of issues. I sent all of those and the reproducers to Felix Krause (maintainer of the AdaYaml project). He was quick to answer and analyse all the exceptions. Here are his comments: • ADA.STRINGS.UTF_ENCODING.ENCODING_ERROR : bad input at Item (1) I guess this happens when you use a unicode escape sequence that codifies a code point beyond the unicode range (0 .. 0x10ffff). Definitely an error and should raise a Lexer_Error instead. … and he created issue https://github.com/yaml/AdaYaml/issues/4 • CONSTRAINT_ERROR : text.adb:203 invalid data This hints to a serious error in my custom string allocator that can lead to memory corruption. I have to investigate to be able to tell what goes wrong here. … and then he found the problem: https://github.com/yaml/AdaYaml/issues/5 • CONSTRAINT_ERROR : Yaml.Dom.Mapping_Data.Node_Maps.Insert: attempt to insert key already in map This happens when you try to parse a YAML mapping that has two identical keys (this is conformant to the standard which disallows that). However, the error should be catched and a Compose_Error should be raised instead. … and he opened https://github.com/yaml/AdaYaml/issues/3 • CONSTRAINT_ERROR : yaml-lexer-evaluation.adb:283 overflow check failed • CONSTRAINT_ERROR : yaml-lexer-evaluation.adb:286 overflow check failed • CONSTRAINT_ERROR : yaml-lexer-evaluation.adb:289 overflow check failed This is, thankfully, an obvious error: Hex escape sequence in the input may have up to eight nibbles, so they represent a value range of 0 .. 2**32 - 1. I use, however, a Natural to store that value, which is a subtype of Integer, which is of platform-dependent range – in this case, it is probably 32-bit, but since it is signed, its range goes only up to 2**31 - 1. This would suffice in theory, since the largest unicode code point is 0x10ffff, but AdaYaml needs to catch cases that exceed this range. … and attached to https://github.com/yaml/AdaYaml/issues/4 • STORAGE_ERROR : stack overflow or erroneous memory access … and he created issue https://github.com/yaml/AdaYaml/issues/6 and changed the parsing mode of nested structures to avoid stack overflows (no more recursion). There were also some “hangs”: AFL monitors the execution time of every test case, and flags large timeouts as hangs, to be inspected separately from crashes. Felix took the examples with a long execution time, and found an issue with the hashing of nodes. With all those error cases, Felix created an issue that references all the individual issues, and corrected them. After all the corrections, Felix gave me a analysis of the usefulness of the test: Your findings mirror the test coverage of the AdaYaml modules pretty well: There was no bug in the parser, as this is the most well-tested module. One bug each was found in the lexer and the text memory management, as these modules do have high test coverage, but only because they are needed for the parser tests. And then three errors in the DOM code as this module is almost completely untested. After reading a first draft of this blog post, Felix noted that YAML.STREAM_ERROR was in fact an unexpected error in my test program. Also, you should not exclude Yaml.Stream_Error. This error means that a malformed event stream has been encountered. Parsing a YAML input stream or serializing a DOM structure should *always* create a valid event stream unless it raises an exception – hence getting Yaml.Stream_Error would actually show that there's an internal error in one of those components. [...] Yaml.Stream_Error would only be an error with external cause if you generate an event stream manually in your code. I filtered this exception because I'd encountered it in the test suite available in the AdaYaml github repository (it is in fact a copy of the reference yaml test-suite). I wanted to use the complete test suite as a starting corpus, but examples 8G76 and 98YD crashed and it prevented me from starting the fuzzing session, so instead of removing the crashing test cases, I filtered out the exception... The fact that 2 test cases from the YAML test suite make my simple program crash is interesting, but can we find more cases ? I removed those 2 files from the initial corpus, and I focused the small test program on finding cases that crash on a YAML.STREAM_ERROR: exception when Occurence : others => declare N : constant String := Ada.Exceptions.Exception_Name (Occurence); begin Ada.Text_IO.Put_Line (Ada.Exceptions.Exception_Information (Occurence)); if N = "YAML.STREAM_ERROR" then GNAT.Exception_Actions.Core_Dump (Occurence); end if; end Yaml_Test; In less than 5 minutes, AFL finds 5 categories of crashes: • raised YAML.STREAM_ERROR : Unexpected event (expected document end): ALIAS • raised YAML.STREAM_ERROR : Unexpected event (expected document end): MAPPING_START • raised YAML.STREAM_ERROR : Unexpected event (expected document end): SCALAR • raised YAML.STREAM_ERROR : Unexpected event (expected document end): SEQUENCE_START • raised YAML.STREAM_ERROR : Unexpected event (expected document start): STREAM_END Felix was quick to answer: Well, seems like you've found a bug in the parser. This looks like the parser may generate some node after the first root node of a document, although a document always has exactly one root node. This should never happen; if the YAML contains multiple root nodes, this should be a Parser_Error. I opened a new issue about this, to be checked later. # Fuzzing GNATCOLL.JSON JSON parsers are a common fuzzing target, not that different from YAML. This could be interesting. Following a similar pattern as other fuzzing sessions, let’s first build a simple unit test that reads and parses an input file given at the command-line (first argument), using GNATCOLL.JSON (https://github.com/AdaCore/gnatcoll-core/blob/master/src/gnatcoll-json.ads). This time I massaged one of the unit tests into a simple “read a JSON file all in memory, decode it and print it” test program, that we’ll use for fuzzing. Note: for the exercise here I used GNATCOLL GPL 2016, because that's what I was using for a personal project. You should probably use the latest version when you do this kind of testing, at least before you report your findings. The test program is very simple: procedure JSON_Fuzzing_Test is Filename : constant String := Ada.Command_Line.Argument (1); JSON_Data : Unbounded_String := File_IO.Read_File (Filename); begin declare Value : GNATCOLL.JSON.JSON_Value := GNATCOLL.JSON.Read (Strm => JSON_Data, Filename => Filename); begin declare New_JSON_Data : constant Unbounded_String := GNATCOLL.JSON.Write (Item => Value, Compact => False); begin File_IO.Write_File (File_Name => "out.json", File_Contents => New_JSON_Data); end; end; end JSON_Fuzzing_Test; The GPR file is simple with a twist : to make sure we compile this program with gnatcoll, and that when we’ll use afl-gcc we’ll compile the library code with our substitution compiler, we’ll “with” the actual “gnatcoll_full.gpr” (actual gnatcoll source code !) and not the one for the compiled library. Then we build the project in "debug" mode, to get all the runtime checks available: gprbuild -p -P gnat_json_fuzzing_test.gpr -XGnatcoll_Build=Debug  Then I tried to find a test corpus. One example is https://github.com/nst/JSONTestSuite cited in “Parsing JSON is a minefield”. There’s a test_parsing folder there that contains 318 test cases. Trying to run them first on the new simple test program shows already several "crash" cases: • nice GNATCOLL.JSON.INVALID_JSON_STREAM exceptions • Numerical value too large to fit into an IEEE 754 float • Numerical value too large to fit into a Long_Long_Integer • Unexpected token • Expected ',' in the array value • Unfinished array, expecting ending ']' • Expecting a digit after the initial '-' when decoding a number • Invalid token • Expecting digits after 'e' when decoding a number • Expecting digits after a '.' when decoding a number • Expected a value after the name in a JSON object at index N • Invalid string: cannot find ending " • Nothing to read from stream • Unterminated object value • Unexpected escape sequence … which is fine, since you’ll expect this specific exception when parsing user-provided JSON. Then I got to: • raised ADA.STRINGS.INDEX_ERROR : a-strunb.adb:1482 • n_string_1_surrogate_then_escape_u1.json • n_string_1_surrogate_then_escape_u.json • n_string_invalid-utf-8-in-escape.json • n_structure_unclosed_array_partial_null.json • n_structure_unclosed_array_unfinished_false.json • n_structure_unclosed_array_unfinished_true.json For which I opened https://github.com/AdaCore/gnatcoll-core/issues/5 • raised CONSTRAINT_ERROR : bad input for 'Value: "16#??????"]#" • n_string_incomplete_surrogate.json • n_string_incomplete_escaped_character.json • n_string_1_surrogate_then_escape_u1x.json For which I opened https://github.com/AdaCore/gnatcoll-core/issues/6 • … and STORAGE_ERROR : stack overflow or erroneous memory access • n_structure_100000_opening_arrays.json This last one can be worked around using with ulimit -s unlimited (so, removing the limit of stack size). Still, beware of your stack when parsing user-provided JSON. For AdaYaml similar problems appeared, and were robustified, and I’m not sure whether this potential “denial of service by stack overflow” should be classified as a bug, it’s at least something to know when using GNATCOLL.JSON on user-provided JSON data (I’m guessing most API endpoints these days). Those exceptions are the ones you don’t expect, and maybe didn’t put a catch-all there. A clean GNATCOLL.JSON.INVALID_JSON_STREAM exception might be better. Note: on all those test cases, I didn’t check whether the results of the tests were OK. I just checked for crashes. It might be very interesting to check the corrected of GNATCOLL.JSON against this test suite. Now let’s try through fuzzing to find more cases where you don’t get a clean GNATCOLL.JSON.INVALID_JSON_STREAM. The first step is adding a final “catch-all” exception handler to abort only on unwanted exceptions (not all of them): exception -- we don’t want to abort on a “controlled” exception when GNATCOLL.JSON.INVALID_JSON_STREAM => null; when Occurence : others => Ada.Text_IO.Put_Line ("exception occured for " & Filename & " [" & Ada.Exceptions.Exception_Name (Occurence) & "] [" & Ada.Exceptions.Exception_Message (Occurence) & "] [" & Ada.Exceptions.Exception_Information (Occurence) & "]"); GNAT.Exception_Actions.Core_Dump (Occurence); end JSON_Fuzzing_Test; And then clean the generated executable: gprclean -r -P gnat_json_fuzzing_test.gpr -XGnatcoll_Build=Debug Then rebuild it using afl-gcc: gprbuild --compiler-subst=Ada,/home/lionel/afl/afl-2.51b/afl-gcc -p -P gnat_json_fuzzing_test.gpr -XGnatcoll_Build=Debug Then we generate an input corpus for AFL, by keeping only the files that didn’t generate a call to abort() with the new JSON_Fuzzing_Test test program. On first launch (AFL_I_DONT_CARE_ABOUT_MISSING_CRASHES=ON /home/lionel/aws/afl/afl-2.51b/afl-fuzz -m 1024 -i input -o output ../json_fuzzing_test @@), afl-fuzz complains: [*] Attempting dry run with 'id:000001,orig:i_number_huge_exp.json'... [-] The program took more than 1000 ms to process one of the initial test cases. This is bad news; raising the limit with the -t option is possible, but will probably make the fuzzing process extremely slow. If this test case is just a fluke, the other option is to just avoid it altogether, and find one that is less of a CPU hog. [-] PROGRAM ABORT : Test case 'id:000001,orig:i_number_huge_exp.json' results in a timeout Location : perform_dry_run(), afl-fuzz.c:2776 … and it’s true, the i_number_huge_exp.json file takes a long time to be parsed: [lionel@lionel fuzzing-session]$ time ../json_fuzzing_test input/i_number_huge_exp.json
input/i_number_huge_exp.json:1:2: Numerical value too large to fit into an IEEE 754 float
real    0m7.273s
user    0m3.717s
sys     0m0.008s

My machine isn’t fast, but still, this is a denial of service waiting to happen. I opened a ticket just in case.

Anyway let’s remove those input files that gave a timeout before we even started the fuzzing (the other ones are n_structure_100000_opening_arrays.json and n_structure_open_array_object.json).

During this first afl-fuzz run, in the start phase, a warning appears a lot of times:

[!] WARNING: No new instrumentation output, test case may be useless.

AFL looks through the whole input corpus, and checks whether input files have added any new basic block coverage to the already tested examples (also from the input corpus).

The initial phase ends with:

[!] WARNING: Some test cases look useless. Consider using a smaller set.
[!] WARNING: You probably have far too many input files! Consider trimming down.

To be the most efficient, afl-fuzz needs the slimmest input corpus with the highest basic block coverage, the most representative of all the OK code paths, and the least redundant possible. You can look through the afl-cmin and afl-tmin tools to minimize your input corpus.

For this session, let’s keep the test corpus as it is (large and redundant), and start the fuzzing session.

In the first seconds of fuzzing, we already get the following state:

Already 3 crashes, and 2 “hangs”. Looking through those, it seems afl-fuzz already found by itself examples of “ADA.STRINGS.INDEX_ERROR : a-strunb.adb:1482” and “CONSTRAINT_ERROR : bad input for 'Value: "16#?????”, although I removed from the corpus all files that showed those problems.

Same thing with the “hang”, afl-fuzz found an example of large float number, although I removed all “*_huge_*” float examples.

Let’s try and focus on finding something else than the ones we know.

I added the following code in the top-level exception handler:

   when Occurence : others =>
declare
Text : constant String := Ada.Exceptions.Exception_Information (Occurence);
begin
if    Ada.Strings.Fixed.Index (Source => Text, Pattern => "bad input for 'Value:") /= 0 then return;
elsif Ada.Strings.Fixed.Index (Source => Text, Pattern => "a-strunb.adb:1482") /= 0 then return;
end if;
end;

It’s very hacky but it’ll remove some parasites (i.e. the crashes we know) from the crash bin.

Let’s restart the fuzzing session (remove the output/ directory, recreate it, and call afl-fuzz again).

Now after 10 minutes, no crash had occured, so I let the fuzzer run for 2 days straight, and it didn’t find any crash or hang other than the ones already triggered by the test suite.

It did however find some additional stack overflows (with examples that open a lot of arrays) even though I had put 1024m as a memory limit for afl-fuzz… Maybe something to look up...

# What next?

If you want to dive deeper in the subject of fuzzing with AFL, here's a short reading list for you:

• Even simpler, if you have an extensive test case list, you can use afl-cmin (a corpus minimizer) to directly fuzz your parser or application efficiently, see the great success of AFL on sqlite.
• The fuzzing world took on the work of lcamtuf and you can often hear about fuzzing-specific passes in clang/llvm to help fuzzing where it's bad at (checksums, magic strings, whole-string comparisons...).
• There's a lot of tooling around afl-fuzz: aflgo directs the focus of a fuzzing session to specific code parts, and Pythia helps evaluate the efficiency of your fuzzing session. See also afl-cov for live coverage analysis.

If you find bugs, or even just perform a fuzzing pass on your favorite open-source software, don't hesitate to get in touch with the maintainers of the project. From my experience, most of the time, maintainers will be happy to get free testing. Even if it's just to say that AFL didn't find anything in 3 days... It's already a badge of honor :-).

#### Thanks:

Many thanks to Yannick Moy that sparked the idea of this blog post after I talked his ear off for a year (was it two?) about AFL and fuzzing in Ada, and helped me proof-read it. Thanks to Gautier and Felix who were very reactive and nice about the reports, and who took some time to read drafts of this post. All your suggestions were very helpful.

]]>

Libadalang has come a long way since the last time we blogged about it. In the past 6 months, we have been working tirelessly on name resolution, a pretty complicated topic in Ada, and it is finally ready enough that we feel ready to blog about it, and encourage people to try it out.

WARNING: While pretty far along, the work is still not finished. It is expected that some statements and declarations are not yet resolved. You might also run into the occasional crash. Feel free to report that on our github!

In our last blog post, we learned how to use Libadalang’s lexical and syntactic analyzers in order to highlight Ada source code. You may know websites that display source code with cross-referencing information: this makes it possible to navigate from references to declarations. For instance elixir, Free Electrons’ Linux source code explorer: go to a random source file and click on an identifier. This kind of tool makes it very easy to explore an unknown code base.

So, we extended our code highlighter to generate cross-references links, as a showcase of Libadalang’s semantic analysis abilities. If you are lazy, or just want to play with the code, you can find a compilable set of source files for it at Libadalang’s repository on GitHub (look for ada2web.adb). If you are interested in how to use name resolution in your own programs, we will use this blog post to show how to use Libadalang’s name resolution to expand our previous code highlighter.

Note that if you haven’t read the previous blog post, we recommend you to read it as below, we assume familiarity with topics from it.

## Where are my source files?

Unlike lexical and syntactic analysis, which process source files separately, semantic analysis works on a set of source files, or more precisely on a source files plus all its dependencies. This is logical: in order to understand an object declaration in foo.ads, one needs to know about the corresponding type, and if the type is declared in another source file (say bar.ads), both files are required for analysis.

By default, Libadalang assumes that all source files are in the current directory. That’s enough for toy source files, but not at all for real world projects, which are generally spread over multiple directories in a complex nesting scheme. Libadalang can’t know about the files layout of all Ada projects in the world, so we created an abstraction that enables anyone to tell it how to reach source files: the Libadalang.Analysis.Unit_Provider_Interface interface type. This type has exactly one abstract primitive: Get_Unit which, given a unit name and a unit kind (specification or body?) calls Analysis_Context’s Get_From_File or Get_From_Buffer to create the corresponding analysis unit.

In the context of a source code editor (for instance), this allows Libadalang to query a source file even if this file exists only in memory, not in a real source file, or if it’s more up-to-date in memory. Using a custom unit provider in Libadalang is easy: dynamically allocate a concrete implementation of this interface, then pass it to the Unit_Provider formal in Analysis_Context’s constructor: the Create function. Libadalang will take care of deallocating this object when the context is destroyed.

declare
UP  : My_Unit_Provider_Access :=
new My_Unit_Provider_Type …;
Ctx : Analysis_Context := Create (Unit_Provider => UP);
--  UP will be queried when performing name resolution
begin
--  Do useful things, and then when done…
Destroy (Ctx);
end;

Nowadays, a lot of Ada projects use GPRbuild and thus have a project file. That’s fortunate: project files give us exactly the information Libadalang needs: where are source files, what’s their naming scheme. Because of this, Libadalang provides a tagged type that implements this interface to deal with project files: Project_Unit_Provider_Type, from the Libadalang.Unit_Files.Projects package. In order to do this, one first need to load the project file using GNATCOLL.Projects:

declare
Project_File : GNATCOLL.VFS.Virtual_File;
Project      : GNATCOLL.Projects.Project_Tree_Access;
Env          : GNATCOLL.Projects.Project_Environment_Access;
begin
--  First load the project file
Project := new Project_Tree;
Initialize (Env);

--  Initialize Project_File, set the target, create
--  scenario variables, …

--  Now create the unit provider and the analysis context.
--  Is_Project_Owner is set to True so that the project
--  is deallocated when UP is destroyed.
UP := new Project_Unit_Provider_Type’
(Create (Project, Env, True));
Ctx := Create (Unit_Provider => UP);

--  Do useful things, and then when done…
Destroy (Ctx);
end;

Now that Libadalang knows where the source files are, we can ask it to resolve names!

Just like in the highligther, most of the website generator will consist of asking Libadalang to parse source files (Get_From_File), checking for lexing/parsing errors (Has_Diagnostics, Diagnostics) and then dealing with AST nodes and tokens in analysis units. The new bits here are turning identifiers into hypertext links to redirect to their definition. As for highlighting classes, we do this token annotation with an array and a tree traversal:

Unit : Analysis_Unit := …;
--  Analysis unit to process

Xrefs : array (1 .. Token_Count (Unit)) of Basic_Decl :=
(others => No_Basic_Decl);
--  For each token, the declaration to which the token should
--  link or No_Basic_Decl for no cross-reference.

function Process_Node
--  Callback for AST traversal. For string literals and
--  identifiers, annotate the corresponding in Xrefs to the
--  designated declaration, if found.

With these declarations, we can do the annotations easily:

Root (Unit).Traverse (Process_Node’Access);

But how Process_Node does its magic? That’s easy too:

function Process_Node
(Node : Ada_Node’Class) return Visit_Status is
begin
--  Annotate only tokens for string literals and
--  identifiers.
then
return Into;
end if;

declare
Token : constant Token_Type :=
Node.As_Single_Tok_Node.F_Tok;
Idx   : constant Natural := Natural (Index (Token));
Decl  : Basic_Decl renames Xrefs (Idx);
begin
Decl := Node.P_Referenced_Decl;
exception
when Property_Error => null;
end;
end Process_Node;

String literal and identifier nodes both inherit from the Single_Tok_Node abstract node, hence the conversion to retrieve the underlying token. Then we locate which cell in the Xrefs array they correspond to. And finally we fill it with the result of the P_Referenced_Decl primitive. This function tries to fetch the declaration corresponding to Node. Easy I said!

What’s the exception handler for, you might ask, though? What we call AST node properties (all functions whose name starts with P_) can raise Property_Error exceptions. These can happen if Libadalang works on invalid Ada sources and cannot find query results. As name resolution is still actively developed, it can happen that this exception is raised even for valid source code: if that happens to you, please report this bug! Note that if a property raises an exception that is not a Property_Error, this is another kind of bug: please report it too!

## Bind it all together

Now we have a list of Basic_Decl nodes to create hypertext links, but how can we do that? The trick is to get the name of the source file that contains this declaration, plus its source location:

Decl_Unit : constant Analysis_Unit := Decl.Get_Unit;
Decl_File : constant String := Get_Filename (Decl_Unit);
Decl_Line : constant Langkit_Support.Slocs.Line_Number :=
Decl.Sloc_Range.Start_Line;

Then you can turn this information into a hypertext link. For example, if you generate X.html for the X source file (foo.ads.html for foo.ads, …) and generate LY HTML anchors for line number Y:

Line_No : constant String :=
Natural'Image (Natural (Decl_Line));
Href : constant String :=
Decl_File & ".html#L"
& Line_No (Line_No'First + 1 .. Line_No'Last);

Some amount of plumbing is still needed to have a complete website generator:

• get a list of all source files to process in the loaded project, using GNATCOLL.Projects’ API;

• actually output HTML code: code from the previous blog can be reused and updated to do this;
• generate an index HTML file as an entry point for navigation.

But as usual, covering all these topics will get out of the scope of this blog post and will make an unreasonable long essay. So thank you once more for reading this post to the end!

]]>

## Summary

The Ada IoT Stack consists of an lwIp (“lightweight IP”) stack implementation written in Ada, with an associated high-level protocol to support embedded device connectivity nodes for today’s IoT world. The project was developed for the Make With Ada 2017 competition based on existing libraries and ported to embedded STM32 devices.

## Motivation

Being a huge fan of IoT designs, I originally planned to work on a real device such an IoT node or gateway, while using the Ada language. I really enjoy writing programs but my roots are in hardware (I earned a B.S. in electronic engineering). Back then I was not really a programmer, but if you can master the intricacies of hardware I think that experience will eventually help make you a good programmer; I’m not sure it works the same way in the other direction.

With so many programming languages out there, it's difficult to gain experience with all of them. In my case I was not even aware of the Ada language until I found out about the contest. That got me to learn Ada: there is a saying that without deadlines nobody will finish on time, and the contest supplied both the motivation and a deadline. For me the best way to learn something is not to read a book chapter by chapter. Sometimes you need to read a “getting starting” guide to gain a basic knowledge, but after that if you don't have a problem to solve, your motivation might come to an end. I made my choice to continue, with the IoT project.

## Network Stack

When I started the IoT project I soon realized that the Ada Drivers Library didn't provide a TCP/IP stack. The Etherscope software by Mr. Carrez from the 2016 Make with Ada contest provided the Ethernet connectivity and UDP protocol but not TCP. I started to look at how to implement the TCP/IP stack, but due to my lack of experience in Ada and the fact that contest was running for almost two months it would not have worked to do something from scratch. So I wrote to Mr. Chouteau (at AdaCore) asking for information about any Ada libraries that implement a TCP stack, and he referred me to a lwIp implementation in Ada and Spark 2014. It was on a Spark 2014 git repository that had not shown up on a Google search. As far as I could tell, the code was only tested on a Linux-flavor OS using a TAP interface. I spent a couple of weeks studying the code and getting it to work on my debian box, which was a little hacky at the end since the TAP C driver implementation (system calls) didn’t work as expected; I ended up coding the TAP interface by hand.

A TAP device operates at layer 2 of the OSI model, which means Ethernet frames. That makes sense here since lwIp implements networking from Ethernet datagrams. Then I realized that I could combine the Etherscope project with the lwIp implementation. After removing several parts of the Etherscope project to get only Ethernet Frames I was ready to feed the lwIp stack. It was not so easy, I spent some time porting the lwIp Ada code to the Ada version for embedded ARM. One obstacle I found was the use of xml files to describe the network datagrams and other structures in a way that's used by a program called xmlada to generate some Ada body files. These describe things like bit and byte positions of TCP flags or fields within the datagram. The problem was that the ARM version don't provide xmlada, so I end up copying the generated files in my project.

After quite some time I got the lwIp stack to work on my STM32F769I board. This was no easy task, especially because the STLink debugger is not so easy to work with. (For example semihosting is basically the only way to have debugger output in the form of "printf". This is really slow and basically interrupts the flow of program execution in a nasty way. The problem here is that the ST board doesn't provide a JTAG interface to the Cortex M4/M7 device, and the STlink on board doesn't have an SWO line connection.)

## IoT Stack

The TCP/IP stack was just the beginning; it was really nice to see it working, but quickly gets boring. The original lwIp implements a TCP echo server: you open a socket, connect and then anything you send is replied by the server, which is not very useful for IoT. So I felt I was not making real progress, at least toward something that would give the judges a tangible project to evaluate. Again I was in a rush, this time with more knowledge of Ada but, as before, without wanting to write something from scratch.

One day I found the Simple Components Ada code by Mr. Kazakov. I really got to love it, but the problem was that after reading it a little bit I felt similar to the day I assisted my first German language "unterricht" years ago. I decided to continue spending time reading it until I finally figured out how to start porting it to my lwIp implementation. The first thing I ported was the MQTT client code, because of its simplicity in terms of dependencies on other “Simple Components” classes if we can refer to them like that. One problem solved here was the change in paradigm. Simple Components used Ada GNAT Sockets, and my lwip basically uses a callback scheme:, two different worlds, but previous experience with sockets helps since you know what the code is doing and what it should do in the new environment. At the end, the MQTT port consumed more time than expected since I not only ported the Simple Components but I also added code from the existing MQTT lwIp implementation in C to make up for the lack of timers.

It was really difficult for me to figure out how the connection state can be recovered when the callback executes. For example when a connection is made, certain variables are initialized and kept in a data structure, but after the callback returns, the structure needs to be preserved so that it can be recovered by a connection event and used in the corresponding callback.

The MQTT client gave me the insights and the experience to continue working and to try the more complicated HTTP protocol, this time improving the callback association. The port of the HTTP server was where problems with the lwIp implementation start to arise. I am almost sure that the lwIp code was only tested in a specific TCP/IP echo-controlled environment; the problem is that a TCP connection can behave differently, or more precisely has different scenarios (e.g., when closing a connection), so I ended up "patching the code" to behave as closely as possible to the standard. I also fixed some memory allocation problems with the lwIp Pcb's of the original stack. Nonetheless if you decide to try the code please be aware that the code should be treated as a "development " version.

Ultimately there was not enough time to finish my IoT Node as I had initially intended. The good part is that I really enjoy solving this kind of problem.

## Work in progress

I really was impressed with Ada, it has the power to do things that in other languages like C/C++ would be much too prone error-prone. The lack of a good debugger was offset by the increased productivity in writing code that had fewer bugs in the first place. I hope to have some time to continue working on this project in the near future.

]]>

As we see the importance of software grow in applications, the quality of that software has become more and more important. Even outside the mission- and safety-critical arena customers are no longer accepting software failures (the famous blue screens of death, and there are many...). Ada has a very strong answer here and we are seeing more and more interest in using the language from a range of industries. It is for this reason that we have completed our product line by including an entry-level offer for C/C++ developers wanting to switch to Ada and reinforced our existing offer with GNAT Pro Assurance for programmers building the most robust software platforms with life cycles spanning decades.

This recent press release explains the positioning of the GNAT Pro family. Details of the full product range can be found here.

]]>
There's a mini-RTOS in my language https://blog.adacore.com/theres-a-mini-rtos-in-my-language Thu, 23 Nov 2017 10:06:15 +0000 Fabien Chouteau https://blog.adacore.com/theres-a-mini-rtos-in-my-language

In this blog post I will focus on the embedded side of things. First because it's what I like, and also because it's much more simple :)

For real-time and embedded applications, Ada defines a profile called Ravenscar. It's a subset of the language designed to help schedulability analysis, it is also more compatible with platforms such as micro-controllers that have limited resources.

So this will not be a complete lecture on Ada tasking. I might do a follow-up with some more tasking features, if you ask for it in the comments ;)

So the first thing is to create tasks, right?

There are two ways to create tasks in Ada, first you can declare and implement a single task:

   --  Task declaration
task My_Task;
   --  Task implementation
begin
--  Do something cool here...
end My_Task;

If you have multiple tasks doing the same job or if you are writing a library, you can define a task type:

   --  Task type declaration
task type My_Task_Type;
   --  Task type implementation
begin
--  Do something really cool here...
end My_Task_Type;

And then create as many tasks of this type as you want:

   T1 : My_Task_Type;
T2 : My_Task_Type;

One limitation of Ravenscar compared to full Ada, is that the number of tasks has to be known at compile time.

# Time

The timing features of Ravenscar are provided by the package (you guessed it) Ada.Real_Time.

In this package you will find:

•  a definition of the Time type which represents the time elapsed since the start of the system
•  a definition of the Time_Span type which represents a period between two Time values
•  a function Clock that returns the current time (monotonic count since the start of the system)
•  Various sub-programs to manipulate Time and Time_Span values

The Ada language also provides an instruction to suspend a task until a given point in time: delay until.

Here's an example of how to create a cyclic task using the timing features of Ada.

  task body My_Task is
Period       : constant Time_Span := Milliseconds (100);
Next_Release : Time;
begin
--  Set Initial release time
Next_Release := Clock + Period;

loop
--  Suspend My_Task until the Clock is greater than Next_Release
delay until Next_Release;

--  Compute the next release time
Next_Release := Next_Release + Period;

--  Do something really cool at 10Hz...
end loop;

end My_Task;

# Scheduling

Ravenscar has priority-based preemptive scheduling. A priority is assigned to each task and the scheduler will make sure that the highest priority task - among the ready tasks - is executing.

A task can be preempted if another task of higher priority is released, either by an external event (interrupt) or at the expiration of its delay until statement (as seen above).

If two tasks have the same priority, they will be executed in the order they were released (FIFO within priorities).

Task priorities are static, however we will see below that a task can have its priority temporary escalated.

The task priority is an integer value between 1 and 256, higher value means higher priority. It is specified with the Priority aspect:

   Task My_Low_Priority_Task
with Priority => 1;

with Priority => 2;

# Mutual exclusion and shared resources

In Ada, mutual exclusion is provided by the protected objects.

At run-time, the protected objects provide the following properties:

• There can be only one task executing a protected operation at a given time (mutual exclusion)
• There can be no deadlock

In the Ravenscar profile, this is achieved with Priority Ceiling Protocol.

A priority is assigned to each protected object, any tasks calling a protected sub-program must have a priority below or equal to the priority of the protected object.

When a task calls a protected sub-program, its priority will be temporarily raised to the priority of the protected object. As a result, this task cannot be preempted by any of the other tasks that potentially use this protected object, and therefore the mutual exclusion is ensured.

The Priority Ceiling Protocol also provides a solution to the classic scheduling problem of priority inversion.

Here is an example of protected object:

   --  Specification
protected My_Protected_Object
with Priority => 3
is

procedure Set_Data (Data : Integer);
--  Protected procedues can read and/or modifiy the protected data

function Data return Integer;
--  Protected functions can only read the protected data

private

--  Protected data are declared in the private part
PO_Data : Integer := 0;
end;
   --  Implementation
protected body My_Protected_Object is

procedure Set_Data (Data : Interger) is
begin
PO_Data := Data;
end Set_Data;

function Data return Integer is
begin
return PO_Data;
end Data;
end My_Protected_Object;

# Synchronization

Another cool feature of protected objects is the synchronization between tasks.

It is done with a different kind of operation called an entry.

An entry has the same properties as a protected procedure except it will only be executed if a given condition is true. A task calling an entry will be suspended until the condition is true.

This feature can be used to synchronize tasks. Here's an example:

   protected My_Protected_Object is
procedure Send_Signal;
entry Wait_For_Signal;
private
We_Have_A_Signal : Boolean := False;
end My_Protected_Object;
   protected body My_Protected_Object is

procedure Send_Signal is
begin
We_Have_A_Signal := True;
end Send_Signal;

entry Wait_For_Signal when We_Have_A_Signal is
begin
We_Have_A_Signal := False;
end Wait_For_Signal;
end My_Protected_Object;

# Interrupt Handling

Protected objects are also used for interrupt handling. Private procedures of a protected object can be attached to an interrupt using the Attach_Handler aspect.

   protected My_Protected_Object
with Interrupt_Priority => 255
is

private

procedure UART_Interrupt_Handler
with Attach_Handler => UART_Interrupt;

end My_Protected_Object;

Combined with an entry it provides and elegant way to handle incoming data on a serial port for instance:

   protected My_Protected_Object
with Interrupt_Priority => 255
is
entry Get_Next_Character (C : out Character);

private
procedure UART_Interrupt_Handler
with Attach_Handler => UART_Interrupt;

We_Have_A_Char : Boolean := False;
end
   protected body My_Protected_Object is

entry Get_Next_Character (C : out Character) when We_Have_A_Char is
begin
We_Have_A_Char := False;
end Get_Next_Character;

procedure UART_Interrupt_Handler is
begin
We_Have_A_Char := True;
end UART_Interrupt_Handler;
end

A task calling the entry Get_Next_Character will be suspended until an interrupt is triggered and the handler reads a character from the UART device. In the meantime, other tasks will be able to execute on the CPU.

# Multi-core support

Ada supports static and dynamic allocation of tasks to cores on multi processor architectures. The Ravenscar profile restricts this support to a fully partitioned approach were tasks are statically allocated to processors and there is no task migration among CPUs. These parallel tasks running on different CPUs can communicate and synchronize using protected objects.

The CPU aspect specifies the task affinity:

   task Producer with CPU => 1;
task Consumer with CPU => 2;
--  Parallel tasks statically allocated to different cores

# Implementations

That's it for the quick overview of the basic Ada Ravenscar tasking features.

One of the advantages of having tasking as part of the language standard is the portability, you can run the same Ravenscar application on Windows, Linux, MacOs or an RTOS like VxWorks. GNAT also provides a small stand alone run-time that implements the Ravenscar tasking on bare metal. This run-time is available, for instance, on ARM Cortex-M micro-controllers.

It's like having an RTOS in your language.

]]>

### Summary

The Hexiwear is an IoT wearable development board that has two NXP Kinetis microcontrollers. One is a K64F (Cortex-M4 core) for running the main embedded application software. The other one is a KW40 (Cortex M0+ core) for running a wireless connectivity stack (e.g., Bluetooth BLE or Thread). The Hexiwear board also has a rich set of peripherals, including OLED display, accelerometer, magnetometer, gryroscope, pressure sensor, temperature sensor and heart-rate sensor. This blog article describes the development of a "Swiss Army Knife" watch on the Hexiwear platform. It is a bare-metal embedded application developed 100% in Ada 2012, from the lowest level device drivers all the way up to the application-specific code, for the Hexiwear's K64F microcontroller.

I developed Ada drivers for Hexiwear-specific peripherals from scratch, as they were not supported by AdaCore's Ada drivers library. Also, since I wanted to use the GNAT GPL 2017 Ada compiler but the GNAT GPL distribution did not include a port of the Ada Runtime for the Hexiwear board, I also had to port the GNAT GPL 2017 Ada runtime to the Hexiwear. All this application-independent code can be leveraged by anyone interested in developing Ada applications for the Hexiwear wearable device.

### Project Overview

The purpose of this project is to develop the embedded software of a "Swiss Army Knife" watch in Ada 2012 on the Hexiwear wearable device.

The Hexiwear is an IoT wearable development board that has two NXP Kinetis microcontrollers. One is a K64F (Cortex-M4 core) for running the main embedded application software. The other one is a KW40 (Cortex M0+ core) for running a wireless connectivity stack (e.g., Bluetooth BLE or Thread). The Hexiwear board also has a rich set of peripherals, including OLED display, accelerometer, magnetometer, gryroscope, pressure sensor, temperature sensor and heart-rate sensor.

The motivation of this project is two-fold. First, to demonstrate that the whole bare-metal embedded software of this kind of IoT wearable device can be developed 100% in Ada, from the lowest level device drivers all the way up to the application-specific code. Second, software development for this project will produce a series of reusable modules that can be used in the future as a basis for creating "labs" for teaching an Ada 2012 embedded software development class using the Hexiwear platform. Given the fact that the Hexiwear platform is a very attractive platform for embedded software development, its appeal can be used to attract more embedded developers to learn Ada.

The scope of the project will be to develop only the firmware that runs on the application microcontroller (K64F). Ada drivers for Hexiwear-specific peripherals need to be developed from scratch, as they are not supported by AdaCore’s Ada drivers library. Also, since I will be using the GNAT GPL 2017 Ada compiler and the GNAT GPL distribution does not include a port of the Ada Runtime for the Hexiwear board, the GNAT GPL 2017 Ada runtime needs to be ported to the Hexiwear board.

The specific functionality of the watch application for the time frame of "Make with Ada 2017" will include:

• Watch mode: show current time, date, altitude and temperature
• Heart rate monitor mode: show heart rate (when Hexiwear worn on the wrist)
• G-Forces monitor mode: show G forces in the three axis (X, Y, Z).

In addition, when the Hexiwear is plugged to a docking station, a command-line interface will be provided over the UART port of the docking station. This interface can be used to set configurable parameters of the watch and to dump debugging information.

### Summary of Accomplishments

I designed and implemented the "Swiss Army Knife" watch application and all necessary peripheral drivers 100% in Ada 2012. The only third-party code used in this project, besides the GNAT GPL Ravenscar SFP Ada runtime library, is the following:

• Ada package specification files, generated by the the svd2ada tool, containing declarations for the I/O registers of the Kinetis K64F’s peripherals.

Below are some diagrams depicting the software architecture of the "Swiss Army Knife" watch:

The current implementation of the "Swiss Army Knife" watch firmware, delivered for "Make with Ada 2017" has the following functionality:

• Three operating modes:
• Watch mode: show current time, date, altitude and temperature
• Heart rate monitor mode: show heart rate monitor raw reading, when Hexiwear worn on the wrist.
• G-Forces monitor mode: show G forces in the three axis (X, Y, Z).
• To switch between modes, the user just needs to do a double-tap on the watch’s display. When the watch is first powered on, it starts in "watch mode".
• To make the battery charge last longer, the microcontroller is put in deep-sleep mode (low-leakage stop mode) and the display is turned off, after 5 seconds. A quick tap on the watch’s display, will wake it up. Using the deep-sleep mode, makes it possible to extend the battery life from 2 hours to 12 hours, on a single charge. Indeed, now I can use the Hexiwear as my personal watch during the day, and just need to charge it at night.
• A command-line interface over UART, available when the Hexiwear device is plugged to its docking station. This interface is used for configuring the following attributes of the watch:
1. Current time (not persistent on power loss, but persistent across board resets)
2. Current date (not persistent on power loss, but persistent across board resets)
3. Background color (persistent on the K64F microcontroller’s NOR flash, unless firmware is re-imaged)
4. Foreground color (persistent on the K64F microcontroller’s NOR flash, unless firmware is re-imaged)

Also, the command-line interface can be used for dumping the different debugging logs: info, error and debug.

The top-level code of the watch application can be found at: https://github.com/jgrivera67/make-with-ada/tree/master/hexiwear_watch

I ported the GNAT GPL 2017 Ravenscar Small-Foot-Print Ada runtime library to the Hexiwear board and modify it to support the memory protection unit (MPU) of the K64 microcontroller, and more specifically to support MPU-aware tasks. This port of the Ravenscar Ada runtime can be found in GitHub at https://github.com/jgrivera67/embedded-runtimes

The relevant folders are:

I developed device drivers for the following peripherals of the Kinetis K64 micrcontroller as part of this project and contributed their open-source Ada code in GitHub at https://github.com/jgrivera67/make-with-ada/tree/master/drivers/mcu_specific/nxp_kinetis_k64f:

• DMA Engine
• SPI controller
• I2C controller
• Real-time Clock (RTC)
• Low power management
• NOR flash

These drivers are application-independent and can be easily reused for other Ada embedded applications that use the Kinetis K64F microcontroller.

I developed device drivers for the following peripherals of the Hexiwear board as part of this project and contributed their open-source Ada code in GitHub at https://github.com/jgrivera67/make-with-ada/tree/master/drivers/board_specific/hexiwear:

• OLED graphics display (on top of the SPI and DMA engine drivers, and making use of the advanced DMA channel linking functionality of the Kinetis DMA engine)
• 3-axis accelerometer (on top of the I2C driver)
• Heart rate monitor (on top of the I2C driver)
• Barometric pressure sensor (on top of the I2C driver)

These drivers are application-independent and can be easily reused for other Ada embedded applications that use the Hexiwear board.

I designed the watch application and its peripheral drivers, to use the memory protection unit (MPU) of the K64F microcontroller, from the beginning, not as an afterthought. Data protection at the individual data object level is enforced using the MPU, for the private data structures from every Ada package linked into the application. For this, I leveraged the MPU framework I developed earlier and which I presented in the Ada Europe 2017 Conference. Using this MPU framework for the whole watch application demonstrates that the framework scales well for a realistic-size application. The code of this MPU framework is part of a modified Ravenscar small-foot-print Ada runtime library port for the Kinetis K64F microcontroller, whose open-source Ada code I contributed at https://github.com/jgrivera67/embedded-runtimes/tree/master/bsps/kinetis_k64f_common/bsp

I developed a CSP model of the Ada task architecture of the watch firmware with the goal of formally verifying that it is deadlock free, using the FDR4 tool. Although I ran out of time to successfully run the CSP model through the FDR tool, developing the model helped me gain confidence about the overall completeness of the Ada task architecture as well as the completeness of the watch_task’s state machine. Also, the CSP model itself is a form a documentation that provides a high-level formal specification of the Ada task architecture of the watch code.

### Open

My Project’s code is under the BSD license It’s hosted on Github. The project is divided in two repositories:

To do the development, I used the GNAT GPL 2017 toolchain - ARM ELF format (hosted on Windows 10 or Linux), including the GPS IDE. I also used the svd2ada tool to generate Ada code from SVD XML files for the Kinetis microcontrollers I used.

### Collaborative

I designed this Ada project to make it easy for others to leverage my code. Anyone interested in developing their own flavor of "smart" watch for the Hexiwear platform can leverage code from my project. Also, anyone interested in developing any type of embedded application in Ada/SPARK for the Hexiwear platform can leverage parts of the software stack I have developed in Ada 2012 for the Hexiwear, particularly device drivers and platform-independent/application-independent infrastructure code, without having to start from scratch as I had to. All you need is to get your own Hexiwear development kit (http://www.hexiwear.com/shop/), clone my Git repositories mentioned above and get the GNAT GPL 2017 toolchain for ARM Cortex-M (http://libre.adacore.com/download/, choose ARM ELF format).

The Hexiwear platform is an excellent platform to teach embedded software development in general, and in Ada/SPARK in particular, given its rich set of peripherals and reasonable cost. The Ada software stack that I have developed for the Hexiwear platform can be used as a base to create a series of programming labs as a companion to courses in Embedded and Real-time programming in Ada/SPARK, from basic concepts and embedded programming techniques, to more advanced topics such as using DMA engines, power management, connectivity, memory protection and so on.

### Dependable

• I used the memory protection unit to enforce data protection at the granularity of individual non-local data objects, throughout the code of the watch application and its associated device drivers.
• I developed a CSP model of the Ada task architecture of the watch firmware with the goal of formally verifying that it is deadlock free, using the FDR4 tool. Although I ran out of time to successfully run the CSP model through the FDR tool, developing the model helped me gain confidence about the completeness of the watch_task’s state machine and the overall completeness of the Ada task architecture of the watch code.
• I consistently used the information hiding principles to architect the code to ensure high levels of maintainability and portability, and to avoid code duplication across projects and across platforms.
• I leveraged extensively the data encapsulation and modularity features of the Ada language in general, such as private types and child units including private child units, and in some cases subunits and nested subprograms.
• I used gnatdoc comments to document key data structures and subprograms.
• I used Ada 2012 contract-based programming features and assertions extensively
• I used range-based types extensively to leverage the Ada power to detect invalid integer values.
• I Used Ada 2012 aspect constructs wherever possible
• I Used GNAT coding style. I use the -gnaty3abcdefhiklmnoOprstux GNAT compiler option to check compliance with this standard.
• I used GNAT flags to enable rigorous compile-time checks, such as -gnato13 -gnatf -gnatwa -gnatVa -Wall.

### Inventive

• The most innovative aspect of my project is the use of the memory protection unit (MPU) to enforce data protection at the granularity of individual data objects throughout the entire code of the project (both application code and device drivers). Although the firmware of a watch is not a safety-critical application, it serves as a concrete example of a realistic-size piece of embedded software that uses the memory protection unit to enforce data protection at this level of granularity. Indeed, this project demonstrates the feasibility and scalability of using the MPU-based data protection approach that I presented at the Ada Europe 2017 conference earlier this year, for true safety-critical applications.

CSP model of the Ada task architecture of the watch code, with the goal of formally verifying that the task architecture was deadlock free, using the FDR4 model checking tool. Although I ran out of time to successfully run the CSP model through the FDR tool, the model itself provides a high-level formal specification of the Ada task architecture of the watch firmware, which is useful as a concise form of documentation of the task architecture, that is more precise than the task architecture diagram alone.

## Future Work

### Short-Term Future Plans

• Implement software enhancements to existing features of the "Swiss Army Knife" watch:

• Calibrate accuracy of altitude and temperature readings
• Calibrate accuracy of heart rate monitor reading
• Display heart rate in BPM units instead of just showing the raw sensor reading
• Calibrate accuracy of G-force readings
• Develop new features of the "Swiss Army Knife" watch:

• Display compass information (in watch mode). This entails extending the accelerometer driver to support the built-in magnetometer
• Display battery charge remaining. This entails writing a driver for the K64F’s A/D converter to interface with the battery sensor
• Display gyroscope reading. This entails writing a driver for the Hexiwear’s Gyroscope peripheral
• Develop Bluetooth connectivity support, to enable the Hexiwear to talk to a cell phone over blue tooth as a slave and to talk to other Bluetooth slaves as a master. As part of this, a Bluetooth "glue" protocol will need to be developed for the K64Fto communicate with the Bluetooth BLE stack running on the KW40, over a UART. For the BLE stack itself, the one provided by the chip manufacturer will be used. A future challenge could entail to write an entire Bluetooth BLE stack in Ada to replace the manufacturer’s KW40 firmware.
• Finish the formal analysis of the Ada task architecture of the watch code, by successfully running its CSP model through the FDR tool to verify that the architecture is deadlock free, divergence free and that it satisfies other applicable safety and liveness properties.

### Long-Term Future Plans

• Develop more advanced features of the "Swiss Army Knife" watch:

• Use sensor fusion algorithms combining readings from accelerometer and gyroscope
• Add Thread connectivity support, to enable the watch to be an edge device in an IoT network (Thread mesh). This entails developing a UART-based interface to the Hexiwear’s KW40 (which would need to be running a Thread (804.15) stack).
• Use the Hexiwear device as a dash-mounted G-force recorder in a car. Sense variations of the 3D G-forces as the car moves, storing the G-force readings in a circular buffer in memory, to capture the last 10 seconds (or more depending on available memory) of car motion. This information can be extracted over bluetooth.
• Remote control of a Crazyflie 2.0 drone, from the watch, over Bluetooth. Wrist movements will be translated into steering commands for the drone.
• Develop a lab-based Ada/SPARK embedded software development course using the Hexiwear platform, leveraging the code developed in this project. This course could include the following topics and corresponding programming labs:
1. Accessing I/O registers in Ada.
• Lab: Layout of I/O registers in Ada and the svd2ada tool
2. Pin-level I/O: Pin Muxer and GPIO.
• Lab: A Traffic light using the Hexiwear’s RGB LED
3. Embedded Software Architectures: Cyclic Executive.
• Lab: A watch using the real-time clock (RTC) peripheral with polling
4. Embedded Software Architectures: Main loop with Interrupts.
• Lab: A watch using the real-time clock (RTC) with interrupts
• Lab: A watch using the real-time clock (RTC) with tasks
6. Serial Console.
• Lab: UART driver with polling and with interrupts)
7. Sensors: A/D converter.
• Lab: Battery charge sensor
8. Actuators: Pulse Width Modulation (PWM).
• Lab: Vibration motor and light dimmer
9. Writing to NOR flash.
• Lab: Saving config data in NOR Flash
10. Inter-chip communication and complex peripherals: I2C.
• Lab: Using I2C to interface with an accelerometer
11. Inter-chip communication and complex peripherals: SPI.
• Lab: OLED display
12. Direct Memory Access I/O (DMA).
• Lab: Measuring execution time and making OLED display rendering faster with DMA
13. The Memory Protection Unit (MPU).
• Lab: Using the memory protection unit
14. Power Management.
• Lab: Using a microcontroller’s deep sleep mode
15. Cortex-M Architecture and Ada Startup Code.
• Lab: Modifying the Ada startup code in the Ada runtime library to add a reset counter)
16. Recovering from failure.
• Lab: Watchdog timer and software-triggered resets
17. Bluetooth Connectivity
• Lab: transmit accelerometer readings to a cell phone over Bluetooth
]]>
Physical Units Pass the Generic Test https://blog.adacore.com/physical-units-pass-the-generic-test Thu, 16 Nov 2017 07:30:00 +0000 Yannick Moy https://blog.adacore.com/physical-units-pass-the-generic-test

The support for physical units in programming languages is a long-standing issue, which very few languages have even attempted to solve. This issue was mostly solved for Ada in 2012 by our colleagues Ed Schonberg and Vincent Pucci, who introduced special aspects for specifying physical dimensions on types. An aspect Dimension_System allows the programmer to define a new system of physical units, while an aspect Dimension allows setting the dimensions of a given subtype in the dimension system of its parent type. The dimension system in GNAT is completely checked at compile time, with no impact on the executable size or execution time, and it offers a number of facilities for defining units, managing fractional dimensions and printing out dimensioned quantities. For details, see the article "Implementation of a simple dimensionality checking system in Ada 2012" presented at the ACM SIGAda conference HILT 2012 (also attached below).

### Attachments

The GNAT dimension system did not attempt to deal with generics though. As noted in the previous work by Grein, Kazakov and Wilson in "A survey of Physical Units Handling Techniques in Ada":

The conflict between the requirements 1 [Compile-Time Checks] and 2 [No Memory / Speed Overhead at Run-Time] on one side and requirement 4 [Generic Programming] on the other is the source of the problem of dimension handling.

So the solution in GNAT solved 1 and 2 while allowing generic programming but leaving it unchecked for dimensionality correctness. Here is the definition of generic programming by Grein, Kazakov and Wilson:

Here generic is used in a wider sense, as an ability to write code dealing with items of types from some type sets.  In our case it means items of different dimension. For instance, it should be possible to write a dimension-aware integration program, which would work for all valid combinations of dimensions.

This specific issue of programming a generic integration over time that would work for a variety of dimensions was investigated earlier this year by our partners from Technical University of Munich for their Glider autopilot software in SPARK. Working with them, we found a way to upgrade the dimensionality analysis in GNAT to support generic programming, which recently has been implemented in GNAT.

The goal was to apply dimensionality analysis on the instances of a generic, and to preserve dimension as much as possible across type conversions. In our upgraded dimensionality analysis in GNAT, a conversion from a dimensioned type (say, a length) to its dimensionless base type (the root of the dimension system) now preserves the dimension (length here), but will look like valid Ada code to any other Ada compiler. Which makes it possible to define a generic function Integral as follows:

    generic
type Integrand_Type is digits <>;
type Integration_Type is digits <>;
type Integrated_Type is digits <>;
function Integral (X : Integrand_Type; T : Integration_Type) return Integrated_Type;

function Integral (X : Integrand_Type; T : Integration_Type) return Integrated_Type is
begin
return Integrated_Type (Mks_Type(X) * Mks_Type(T));
end Integral;

We are using above the standard Mks_Type defined in System.Dim.Mks as root type, but we could do the same with the root of a user-defined dimension system. The generic code is valid Ada since the arguments X and T are converted into the same type (here Mks_Type), in order to multiply them without compilation error. The result, which is now of type Mks_Type, is then converted to the final type Integrated_Type. With the original dimensionality analysis in GNAT, dimensions were lost during these type conversions.

However, with the upgraded analysis in GNAT, a conversion to the root type indicates that the dimensions of X and T have to be preserved and tracked when converting both to and from the root type Mks_Type. With this, still using the standard types defined in System.Dim.Mks, we can define an instance of this generic that integrates speed over time:

    function Velocity_Integral is new Integral(Speed, Time, Length);

The GNAT-specific dimensionality analysis will perform additional checks for correct dimensionality in all such generic instances, while for any other Ada compiler this program still passes as valid (but not dimensionality-checked) Ada code. For example, for an invalid instance as:

    function Bad_Velocity_Integral is new Integral(Speed, Time, Mass);

GNAT issues the error:

dims.adb:10:05: instantiation error at line 5
dims.adb:10:05: target type has dimension [M]

One subtlety that we faced when developing the Glider software at Technical University of Munich was that, sometimes, we do want to convert a value from a dimensioned type into another dimensioned type. This was the case in particular because we defined our own dimension system in which angles had their own dimension to verify angular calculations, which worked well most of the time.

However, the angle dimension must be removed when multiplying an angle with a length, which produces an (arc) length, or when using an existing trigonometric function that expects a dimensionless argument. In theses cases, a simple type conversion to the dimensionless root type is not enough, because now the dimension of the input is preserved. We found two solutions to this problem:

• either define the root of our dimension system as a derived type from a parent type, say Base_Unit_Type, and convert to/from Base_Unit_Type to remove dimensions; or
• explicitly insert conversion coefficients into the equations with dimensions such that the dimensions do cancel out as required.

For example, our use of an explicit Angle_Type with its own dimension (denoted A) first seemed to cause trouble because of conversions such as this one:

Distance := 2.0 * EARTH_RADIUS * darc; -- expected L, found L.A

where darc is of Angle_Type (dimension A) and EARTH_RADIUS of Length_Type (dimension L). First, we escaped the unit system as follows:

Distance := 2.0 * EARTH_RADIUS * Unit_Type(Base_Unit_Type(darc));

However, this bypasses the dimensionality checking system and can lead to dangerous mixing of physical dimensions. It would be possible to accidentally turn a temperature into a distance, without any warning. A safer way to handle this issue is to insert the missing units explicitly:

Distance := 2.0 * EARTH_RADIUS * darc * 1.0/Radian;

Here, Radian is the unit of Angle_Type, which we need to get rid of to turn an angle into a distance. In other words, the last term represents a coefficient with the required units to turn an angle into a distance. Thus, darc*1.0/Radian still carries the same value as darc, but is dimensionless as required per the equation, and GNAT can perform a dimensionality analysis also in such seemingly dimensionality-defying situations.

Moreover, this solution is less verbose than converting to the base unit type and then back. In fact, it can be made even shorter:

Distance := 2.0 * EARTH_RADIUS * darc/Radian;

With its improved dimensionality analysis, GNAT Pro 18 has solved the conflict between requirements 1 [Compile-Time Checks] and 2 [No Memory / Speed Overhead at Run-Time] on one side and requirement 4 [Generic Programming] on the other side, hopefully making Grein, Kazakov and Wilson happier! The dimensionality analysis in GNAT is a valuable feature for programs that deal with physical units. It increases readability by making dimensions more explicit and it reduces programming errors by checking the dimensions for consistency. For example, we used it on the StratoX Weather Glider from Technical University of Munich, as well as the RESSAC User Case, an example of autonomous vehicle development used as challenge for certification.

For more information on the dimensionality analysis in GNAT, see the GNAT User's Guide. In particular, the new rules that deal with conversions are at the end of the section, and we copy them verbatim below:

The dimension vector of a type conversion T(expr) is defined as follows, based on the nature of T:
-  If T is a dimensioned subtype then DV(T(expr)) is DV(T) provided that either expr is dimensionless or DV(T) = DV(expr). The conversion is illegal if expr is dimensioned and DV(expr) /= DV(T). Note that vector equality does not require that the corresponding Unit_Names be the same.
As a consequence of the above rule, it is possible to convert between different dimension systems that follow the same international system of units, with the seven physical components given in the standard order (length, mass, time, etc.). Thus a length in meters can be converted to a length in inches (with a suitable conversion factor) but cannot be converted, for example, to a mass in pounds.
-  If T is the base type for expr (and the dimensionless root type of the dimension system), then DV(T(expr)) is DV(expr). Thus, if expr is of a dimensioned subtype of T, the conversion may be regarded as a “view conversion” that preserves dimensionality.
This rule makes it possible to write generic code that can be instantiated with compatible dimensioned subtypes. The generic unit will contain conversions that will consequently be present in instantiations, but conversions to the base type will preserve dimensionality and make it possible to write generic code that is correct with respect to dimensionality.
-  Otherwise (i.e., T is neither a dimensioned subtype nor a dimensionable base type), DV(T(expr)) is the empty vector. Thus a dimensioned value can be explicitly converted to a non-dimensioned subtype, which of course then escapes dimensionality analysis.

Thanks to Ed Schonberg and Ben Brosgol from AdaCore for their work on the design and implementation of this enhanced dimensionality analysis in GNAT.

]]>

Not long after my first experience with the Ada programming language I got to know about the Make With Ada 2017 contest. And, just as it seemed, it turned out to be a great way to get a little bit deeper into the language I had just started to learn

The ada-motorcontrol project involves the design of a BLDC motor controller software platform written in Ada. These types of applications often need to be run fast and the core control software is often tightly connected to the microcontroller peripherals. Coming from an embedded systems perspective with C as the reference language, the initial concerns were if an implementation in Ada actually could meet these requirements.

It turned out, on the contrary, that Ada is very capable considering both these requirements. In particular, accessing peripherals on the STM32 with help of the Ada_Drivers_Library really made using the hardware related operations even easier than using the HAL written in C by ST.

Throughout the project I found uses for many of Ada’s features. For example, the representation clause feature made it simple to extract data from received (and to put together the transmit) serial byte streams. Moreover, contract based programming and object oriented concepts such as abstracts and generics provided means to design clean and easy to use interfaces, and a well organized project.

One of the objectives of the project was to provide a software platform to help developing various motor control applications, with the core functionality not being dependent on some particular hardware. Currently however it only supports a custom inverter board, since unfortunately I found that the HAL provided in Ada_Drivers_Library was not comprehensive enough to support all the peripheral features used. But the software is organized such as to keep driver dependent code separated. To put this to test, I welcome contributions to add support for other inverter boards. A good start would be the popular VESC-board.

# Ada Motor Controller Project Log:

## Motivation

The recent advances in electric drives technologies (batteries, motors and power electronics) has led to increasingly higher output power per cost, and power density. This in turn has increased the performance of existing motor control applications, but also enabled some new - many of them are popular projects amongst diyers and makers, e.g. electric bike, electric skateboard, hoverboard, segway etc.

On a hobby-level, the safety aspects related to these is mostly ignored. Professional development of similar applications, however, normally need to fulfill some domain specific standards putting requirements on for example the development process, coding conventions and verification methods. For example, the motor controller of an electric vehicle would need to be developed in accordance to ISO 26262, and if the C language is used, MISRA-C, which defines a set of programming guidelines that aims to prevent unsafe usage of the C language features.

Since the Ada programming language has been designed specifically for safety critical applications, it could be a good alternative to C for implementing safe motor controllers used in e.g. electric vehicle applications. For a comparison of MISRA-C and Ada/SPARK, see this report. Although Ada is an alternative for achieving functional safety, during prototyping it is not uncommon that a mistake leads to destroyed hardware (burned motor or power electronics). Been there, done that! The stricter compiler of Ada could prevent such accidents.

Moreover, while Ada is not a particularly "new" language, it includes more features that would be expected by a modern language, than is provided by C. For example, types defined with a specified range, allowing value range checks already during compile time, and built-in multitasking features. Ada also supports modularization very well, allowing e.g. easy integration of new control interfaces - which is probably the most likely change needed when using the controller for a new application.

This project should consist of and fulfill:

• Core software for controlling brushless DC motors, mainly aimed at hobbyists and makers.
• Support both sensored and sensorless operation.
• Open source software (and hardware).
• Implementation in Ada on top of the Ravenscar runtime for the stm32f4xx.
• Should not be too difficult to port to another microcontroller.

And specifically, for those wanting to learn the details of motor control, or extend with extra features:

• Provide a basic, clean and readable implementation.
• Meaningful and flexible logging.
• Easy to add new control interfaces (e.g. CAN, ADC, Bluetooth, whatever).

## Hardware

The board that will be used for this project is a custom board that I previously designed with the intent to get some hands-on knowledge in motor control. It is completely open source and all project files can be found on GitHub

• Microcontroller STM32F446, ARM Cortex-M4, 180 MHz, FPU
• Power MOSFETs 60 V
• Inline phase current sensing
• PWM/PPM control input
• Position sensor input as either hall or quadrature encoder
• Motor and board temp sensor (NTC)

It can handle power ranges in the order of what is required by an electric skateboard or electric bike, depending on the used battery voltage and cooling.

There are other inverter boards with similar specifications. One very popular is the VESC by Benjamin Vedder. It is probably not that difficult to port this project to work on that board as well.

## Rough project plan

I thought it would be appropriate to write down a few bullets of what needs to be done. The list will probably grow...

• Create a port of the Ravenscar runtime to the stm32f446 target on the custom board
• Get some sort of hello world application running to show that stuff works
• Investigate and experiment with interrupt handling with regards to overhead
• Create initialization code for all used mcu peripherals
• Sketch the overall software architecture and define interfaces
• Implementation
• Documentation...

## Support for the STM32F446

The microprocessor that will be used for this project is the STM32F446. In the current version of the Ada Drivers Library and the available Ravenscar embedded runtimes, there is no explicit support for this device. Fortunately, it is very similar to other processors in the stm32f4-family, so adding support for stm32f446 was not very difficult once I understood the structure of the repositories. I forked these and added them as submodules in this project's repo

Compared to the Ravenscar runtimes used by the discovery-boards, there are differences in external oscillator frequency, available interrupt vectors and memory sizes. Otherwise they are basically the same.

An important tool needed to create the new driver and runtime variants is svd2ada. It generates device specification and body files in ada based on an svd file (basically xml) that describes what peripherals exist, how their registers look like, their address', existing interrupts, and stuff like that. It was easy to use, but a little bit confusing how flags/switches should be set when generating driver and runtime files. After some trail and error I think I got it right. I created a Makefile for generating all these file with correct switches.

I could not find an svd-file for the stm32f446 directly from ST, but found one on the internet. It was not perfect though. Some of the source code that uses the generated data types seems to make assumptions on the structure of these types. Depending on how the svd file looks, svd2ada may or may not generate them in the expected way. There were also other missing and incorrect data in the svd file, so I had to manually correct these. There are probably additional issues that I have not found yet...

## It is alive!

I made a very simple application consisting of a task that is periodically delayed and toggles the two leds on the board each time the task resumes. The leds toggles with the expected period, so the oscillator seems to be initialized correctly.

Next up I need to map the different mcu pins to the corresponding hardware functionality and try to initialize the needed peripherals correctly.

## The control algorithm and its use of peripherals

There are several methods of controlling brushless motors, each with a specific use case. As a first approach I will implement sensored FOC, where the user requests a current value (or torque value).

To simplify, this method can be divided into the following steps, repeated each PWM period (typically around 20 kHz):

1. Sample the phase currents
2. Transform the values into a rotor fixed reference frame
3. Based on the requested current, calculate a new set of phase voltages
4. Transform back to the stator's reference frame
5. Calculate PWM duty cycles as to create the calculated phase voltages

Fortunately, the peripherals of the stm32f446 has a lot of features that makes this easier to implement. For example, it is possible to trigger the ADC directly from the timers that drives the PWM. This way the sampling will automatically be synchronized with the PWM cycle. Step 1 above can thus be started immediately as the ADC triggers the corresponding conversion-complete-interrupt. In fact, many existing implementations perform all the steps 1-to-6 completely within an ISR. The reason for this is simply to reduce any unnecessary overhead since the performed calculations is somewhat lengthy. The requested current is passed to the ISR via global variables.

I would like to do this the traditional way, i.e. to spend as little time as possible in the ISR and trigger a separate Task to perform all calculations. The sampled current values and the requested current shall be passed via Protected Objects. All this will of course create more overhead. Maybe too much? Need to be investigated.

## PWM and ADC is up and running

I have spent some time configuring the PWM and ADC peripherals using the Ada Drivers Library. All in all it went well, but I had to do some smaller changes to the drivers to make it possible to configure the way I wanted.

• PWM is complementary output, center aligned with frequency of 20 kHz
• PWM channels 1 to 3 generates the phase voltages
• PWM channel 4 is used to trigger the ADC, this way it is possible to set where over the PWM period the sampling should occur
• By default the sampling occurs in the middle of the positive waveform (V7)
• The three ADC's are configured to Triple Multi Mode, meaning they are synchronized such that each sampled phase quantity is sampled at the same time.
• Phase currents and voltages a,b,c are mapped to the injected conversions, triggered by the PWM channel 4
• Board temperature and bus voltage is mapped to the regular conversions triggered by a timer at 14 kHz
• Regular conversions are moved to a volatile array using DMA automatically after the conversions complete
• ADC create an interrupt after the injected conversions are complete

The drivers always assumed that the PWM outputs are mapped to a certain GPIO, so in order to configure the trigger channel I had to add a new procedure to the drivers. Also, the Scan Mode of the ADCs where not set up correctly for my configuration, and the config of injected sequence order was simply incorrect. I will send a pull request to get these changes merged with the master branch.

As was described in previous posts the structure used for the interrupt handling is to spend minimum time in the interrupt context and to signal an awaiting task to perform the calculations, which executes at a software priority level with interrupts fully enabled. The alternative method is to place all code in the interrupt context.

This Ada Gem and its following part describes two different approaches for doing this type of task synchronization. Both use a protected procedure as the interrupt handler but signals the awaiting task in different ways. The first uses an entry barrier and the second a Suspension Object. The idiom using the entry barrier has the advantage that it can pass data as an integrated part of the signaling, while the Suspension Object behaves more like a binary semaphore.

For the ADC conversion complete interrupt, I tested both methods. The protected procedure used as the ISR read the converted values consisting of six uint16. For the entry barrier method these where passed to the task using an out-parameter. When using the second method the task needed to collect the sample data using a separate function in the protected object.

Overhead in this context I define as the time from that the ADC generates the interrupt, to the time the event triggered task starts running. This includes, first, an isr-wrapper that is a part of the runtime which then calls the installed protected procedure, and second, the execution time of the protected procedure which reads the sampled data, and finally, the signaling to the awaiting task.

I measured an approximation of the overhead by setting a pin high directly in the beginning of the protected procedure and then low by the waiting task directly when waking up after the signaling. For the Suspension Object case the pin was set low after the read data function call, i.e. for both cases when the sampled data was copied to the task. The code was compiled with the -O3 flag.

The first idiom resulted in an overhead of ~8.4 us, and the second ~10 us. This should be compared to the period of the PWM which at 20 kHz is 50 us. Obviously the overhead is not negligible, so I might consider using the more common approach for motor control applications of having the current control algorithm in the interrupt context instead. However, until the execution time of the algorithm is known, the entry barrier method will be assumed...

Note: "Overhead" might be the wrong term since I don't know if during the time measured the cpu was really busy. Otherwise it should be called latency I think...

## Reference frames

A key benefit of the FOC algorithm is that the actual control is performed in a reference frame that is fixed to the rotor. This way the sinusoidal three phase currents, as seen in the stator's reference frame, will instead be represented as two DC values, assuming steady state operation. The transforms used (Clarke and Park) requires that the angle between the rotor and stator is known. As a first step I am using a quadrature encoder since that provides a very precise measurement and very low overhead due to the hardware support of the stm32.

Three types has been defined, each representing a particular reference frame: Abc, Alfa_Beta and Dq. Using the transforms above one can simply write:

declare
Iabc  : Abc;  --  Measured current (stator ref)
Idq   : Dq;   --  Measured current (rotor ref)
Vdq   : Dq;   --  Calculated output voltage (rotor ref)
Vabc  : Abc;  --  Calculated output voltage (stator ref)
Angle : constant Float := ...;
begin
Idq := Iabc.Clarke.Park(Angle);

--  Do the control...

Vabc := Vdq.Park_Inv(Angle).Clarke_Inv;
end;

Note that Park and Park_Inv both use the same angle. To be precise, they
both use Sin(Angle) and Cos(Angle). Now, at first, I simply implemented
these by letting each transform calculate Sin and Cos locally. Of
course, that is a waste for this particular application. Instead, I
defined an angle object that when created also computed Sin and Cos of
the angle, and added versions of the transforms to use these

declare
--  Same...

Angle : constant Angle_Obj := Compose (Angle_Rad);
--  Calculates Sin and Cos
begin
Idq := Iabc.Clarke.Park(Angle);

--  Do the control...

Vabc := Vdq.Park_Inv(Angle).Clarke_Inv;
end;

This reduced the execution time somewhat (not as much as I thought, though), since the trigonometric functions are the heavy part. Using lookup table based versions instead of the ones provided by Ada.Numerics might be even faster...

## It spins!

The main structure of the current controller is now in place. When a button on the board is pressed the sensor is aligned to the rotor by forcing the rotor to a known angle. Currently, the requested q-current is set by a potentiometer.

As of now, it is definitely not tuned properly, but it at least it shows that the general algorithm is working as intended.

In order to make this project easier to develop on, both for myself and any other users, I need to add some logging and tuning capabilities. This should allow a user to change and/or log variables in the application (e.g. control parameters) while the controller is running. I have written a tool for doing this (over serial) before, but then in C. It would be interesting to rewrite it in Ada.

## Contract Based Programming

So far, I have not used this feature much. But when writing code for the logging functionality I ran into a good fit for it.

I am using Consistent Overhead Byte Stuffing (COBS) to encode the data sent over uart. This encoding results in unambiguous packet framing regardless of packet content, thus making it easy for receiving applications to recover from malformed packets. The packets are separated by a delimiter (value 0 in this case), making it easy to synchronize the receiving parser. The encoding ensures that the encoded packet itself does not contain the delimiter value.

A good feature of COBS is that given that the raw data length is less than 254, then the overhead due to the encoding is always exactly one byte. I could of course simply write this fact as a comment to the encode/decode functions, allowing the user to make this assumption in order to simplify their code. A better way could be to write this condition as contracts.

   Data_Length_Max : constant Buffer_Index := 253;

function COBS_Encode (Input : access Data)
return Data
with
Pre => Input'Length <= Data_Length_Max,
Post => (if Input'Length > 0 then
COBS_Encode'Result'Length = Input'Length + 1
else
Input'Length = COBS_Encode'Result'Length);

function COBS_Decode (Encoded_Data : access Data)
return Data
with
Pre => Encoded_Data'Length <= Data_Length_Max + 1,
Post => (if Encoded_Data'Length > 0 then
COBS_Decode'Result'Length = Encoded_Data'Length - 1
else
Encoded_Data'Length = COBS_Decode'Result'Length);

## Logging and Tuning

I just got the logging and tuning feature working. It is an Ada-implementation using the protocol as used by a previous project of mine, Calmeas. It enables the user to log and change the value of variables in the application, in real-time. This is very helpful when developing systems where the debugger does not have a feature of reading and writing to memory while the target is running.

The data is sent and received over uart, encoded by COBS. The interfaces of the uart and cobs packages implements an abstract stream type, meaning it is very simple to change the uart to some other media, and that e.g. cobs can be skipped if wanted.

### Example

The user can simply do the following in order to get the variable V_Bus_Log loggable and/or tunable:

V_Bus_Log  : aliased Voltage_V;
...
Name        => "V_Bus",
Description => "Bus Voltage [V]");

It works for (un)signed integers of size 8, 16 and 32 bits, and for floats.

After adding a few variables, and connecting the target to the gui:

As an example, this could be used to tune the current controller gains:

As expected, the actual current comes closer to the reference as the gain increases

As of now, the tuning is not done in a "safe" way. The writing to added symbols is done by the separate task named Logger, simply by doing unchecked writes to the address of the added symbol, one byte at a time. At the same time the application is reading the symbol's value from another task with higher prio. The optimal way would be to pass the value through a protected type, but since the tuning is mostly for debugging purposes, I will make it the proper way later on...

Note that the host GUI is not written in Ada (but Python), and is not itself a part of this project.

## Architecture overview

Here is a figure showing an overview of the software:

# Summary

This project involves the design of a software platform that provides a good basis when developing motor controllers for brushless motors. It consist of a basic but clean and readable implementation of a sensored field oriented control algorithm. Included is a logging feature that will simplify development and allows users to visualize what is happening. The project shows that Ada successfully can be used for a bare-metal project that requires fast execution.

The design is, thanks to Ada's many nice features, much easier to understand compared to a lot of the other C-implementations out there, where, as a worst case, everything is done in a single ISR. The combination of increased design readability and the strictness of Ada makes the resulting software safer and simplifies further collaborative development and reuse.

Some highlights of what has been done:

• Porting of the Ravenscar profiles to a custom board using the STM32F446
• Adding some functionality to Ada_Drivers_Library in order to fully use all peripheral features
• Written HAL-isch packages so that it is easy to port to another device than STM32
• Written a communication package and defined interfaces in order to make it easier to add control inputs.
• Written a logging package that allows the developer to debug, log and tune the application in real-time.
• Implemented a basic controller using sensored field oriented control
• Well documented specifications with a generated html version

Future plans:

• Add hall sensor support and 6-step block commutation
• Add CAN support (the pcb has currently no transceiver, though)
• SPARK proving
• Write some additional examples showing how to use the interfaces.
• Port the software to the popular VESC-board.
]]>
Prove in the Cloud https://blog.adacore.com/prove-in-the-cloud Wed, 18 Oct 2017 04:00:00 +0000 Yannick Moy https://blog.adacore.com/prove-in-the-cloud

We have put together a byte (8 bits) of examples of SPARK code on a server in the cloud here. (many thanks Nico Setton for that!)

Each example consists in very few lines, a few files at most, and demoes an interesting feature of SPARK. The initial version is incorrect, and hitting the "Prove" button will return with messages that point to the errors. By following the suggested fix in comments you should be able to get the code to prove automatically.

Of course, you could already do this by installing yourself SPARK GPL on a machine (download here and follow these instructions to install additional provers). The benefit with this webpage is that anyone can now experiment live with SPARK without installing first the toolset. This is very much inspired from what Microsoft Research has done with other verification tools as part of their rise4fun website.

Something particularly interesting for academics is that all the code for this widget is open source. So you can setup your own proof server for hands-on sessions, with your own exercises, in a matter of minutes! Just clone the code_examples_server project from GitHub, follow the instructions in the README, populate your server with your exercises and examples, and you're set! No need to ask for IT to setup all boxes for your students, they just need a browser to point to your server location. An exercise consists in a directory with:

• a file example.yaml with the name and description of the exercise (in YAML syntax)
• a GNAT project file main.gpr (could be almost empty, or force the use of SPARK_Mode so that all the code is analyzed)
• the source files for this exercise

For inspiration, see the examples from the Compile_And_Prove_Demo project from GitHub, inside directory 'examples', which were used to populate our little online proof webpage.

Feel free to report problems or suggest improvements to the widget or the examples on their respective GitHub project pages.

]]>
SPARK Tutorial at FDL Conference https://blog.adacore.com/spark-tutorial-at-fdl-conference Tue, 12 Sep 2017 04:00:00 +0000 Yannick Moy https://blog.adacore.com/spark-tutorial-at-fdl-conference

Researcher Martin Becker from Technische Universität München is giving a SPARK tutorial next week, on Monday 18th, at the Forum on specification & Design Languages in Verona, Italy. Even if you cannot attend, you may find it useful to look at the material for his tutorial, with a complete cookbook to install and setup SPARK, and a 90-minutes slide deck packed with rich and practical information about SPARK, as well as neat hands-on to get a feel for using SPARK in practice.

It is all here with the very good slides in particular.

]]>
New SPARK Cheat Sheet https://blog.adacore.com/new-spark-cheat-sheet Thu, 24 Aug 2017 04:00:00 +0000 Yannick Moy https://blog.adacore.com/new-spark-cheat-sheet

Our good friend Martin Becker has produced a new cheat sheet for SPARK, that you may find useful for a quick reminder on syntax that you have not used for some time. It is simpler than the cheat sheets we already had in English and Japanese, and depending on your style you will prefer one of the other. Also particularly useful for trainings! (and I'll use it for sure, since I have multiple trainings in the coming months...)

]]>

While we are working very hard on semantic analysis in Libadalang, it is already possible to leverage its lexical and syntactic analyzers. A useful example for this is a syntax highlighter.

In the context of programming languages, syntax highlighters make it easier for us, humans, to read source code. For example, formatting keywords with a bold font and a special color, while leaving identifiers with a default style, enables a quick understanding of a program’s structure. Syntax highlighters are so useful that they’re integrated into daily developer tools:

• most “serious” code editors provide highlighters for tons of programming languages (C, Python, Ada, OCaml, shell scripts), markup languages (XML, HTML, BBcode), DSLs (Domain Specific Languages: SQL, TeX), etc.
• probably all online code browsers ship syntax highlighters at least for mainstream languages: GitHub, Bitbucket, GitLab, …

From programming language theory, that many of us learned in engineering school (some even with passion!), we can distinguish two highlighting levels:

1. Tokens(lexeme)-based highlighting. Each token is given a style from its token kind (keyword, comment, integer literal, identifier, …).

2. Syntax-based highlighting. This one is higher-level: for example, give a special color for identifiers that give their name to functions and another color for identifiers that give their name to type declarations.

Most syntax highlighting engines, for instance pygments’s or vim’s, are based on regular expressions, which don’t offer to highlighter writers the same formalism as what we just described. Generally, regular expressions enable us to create an approximation of the two highlighting levels, which yields something a bit fuzzy. On the other hand, it is much easier to write such highlighters and to get pretty results quickly, compared to a fully fledged lexer/parser, which is probably why regular expressions are so popular in this context.

But here we already have a full lexer/parser for the Ada language: Libadalang. So why not use it to build a “perfect” syntax highlighter? This is going to be the exercise of the day. All blog posts so far only demonstrated the use of Libadalang’s Python API; Libadalang is primarily an Ada library, so let’s use its Ada API for once!

One disclaimer, first: the aim of this post is not to say that the world of syntax highlighters is broken and should be re-written with compiler-level lexers and parsers. Regexp-based highlighters are a totally fine compromise in contexts such as text editors; here we just demonstrate how Libadalang can be used to achieve a similar goal, but keep in mind that the result is not technically equivalent. For instance, what we will do below will require valid input Ada sources and will only work one file at a time, unlike editors that might need to work on smaller granularity items to keep the UI responsive, which is more important in this context than correctness.

Okay so, how do we start?

The first thing to do as soon as one wants to use Libadalang is to create an analysis context: this is an object that will enclose the set of sources files to process.

with Libadalang.Analysis;

Ctx : LAL.Analysis_Context := LAL.Create;

Good. At the end of our program, we need to release the resources that were allocated in this context:

LAL.Destroy (Ctx);

Now everything interesting must happen in between. Let’s ask Libadalang to parse a source file:

Unit : LAL.Analysis_Unit := LAL.Get_From_File
(Ctx, "my_source_file.adb", With_Trivia => True);

The analysis of a source file yields what we call in Libadalang an analysis unit. This unit is tied to the context used to create it.

Here, we also enable a peculiar analysis option: With_Trivia tells Libadalang not to discard “trivia” tokens. Inspired by the Roslyn Compiler, what we call a trivia token is a token that is ignored when parsing source code. In Ada as in most (all?) programming languages, comments are trivia: developers are allowed to put any number of them anywhere between two tokens, this will not change the validity of the program nor its semantics. Because of this most compiler implementations just discard them: keeping them around would hurt performance for no gain. Libadalang is designed for all kinds of tools, not only compilers, so we give the choice to the user of whether or not to keep trivia around.

What we are trying to do here is to highlight a source file. We want to highlight comments as well, so we need to ask to preserve trivia.

At this point, a real world program would have to check if parsing happened with no error. We are just playing here, so we’ll skip that, but you can have a look at the Libadalang.Analysis.Has_Diagnostic and Libadalang.Analysis.Diagnostics functions if you want to take care of this.

Fine, so we assume parsing went well and now we just have to go through tokens and assign a specific style to each of them. First, let’s have a look at the various token-related data types in Libadalang we have to deal with:

• LAL.Token_Type: reference to a token/trivia in a specific analysis unit. Think of it as a cursor in a standard container. There is one special value: No_Token which, as you may guess, is used to represent the end of the token stream or just an invalid reference, like a null access.

• LAL.Token_Data_Type: holder for the data related to a specific token. Namely: token kind, whether it’s trivia, index in the token/trivia stream and source location range.

• Libadalang.Lexer.Token_Data_Handlers.Token_Index: a type derived from Integer to represent the indexes in token/trivia streams.

Then let’s define holders to annotate the token stream:

type Highlight_Type is (Text, Comment, Keyword, Block_Name, ...);

Instead of directly assigning colors to the various token kinds, this enumeration defines categories for highlighting. This makes it possible to provide different highlighting styles later: one set of colors for a white background, and another one for a black background, for example.

subtype Token_Index is

type Highlight_Array is
array (Token_Index range <>) of Highlight_Type;

type Highlights_Holder (Token_Count, Trivia_Count : Token_Index) is
record
Token_Highlights  : Highlight_Array (1 .. Token_Count);
Trivia_Highlights : Highlight_Array (1 .. Trivia_Count);
end record;

In Libadalang, even though tokens and trivia make up a logical interleaved stream, they are stored as two separate streams, hence the need for two arrays. So here is a procedure to make the annotating process easier:

procedure Set
(Highlights : in out Highlights_Holder;
Token      : LAL.Token_Data_Type;
HL         : Highlight_Type)
is
Index : constant Token_Index := LAL.Index (Token);
begin
if LAL.Is_Trivia (Token) then
Highlights.Trivia_Highlights (Index) := HL;
else
Highlights.Token_Highlights (Index) := HL;
end if;
end Set;


Now let’s start the actual highlighting! We begin with the token-based one as described earlier.

Basic_Highlights : constant
| -- ...
=> Keyword,
--  ...
);

The above declaration associate a highlighting class for each token kind defined in Libadalang.Lexer. The only work left is to determine highlighting classes iterating on each token in Unit:

Token : LAL.Token_Type := LAL.First_Token (Unit);

while Token /= LAL.No_Token loop
declare
TD : constant LAL.Token_Data_Type := LAL.Data (Token);
HL : constant Highlight_Type :=
Basic_Highlights (LAL.Kind (TD));
begin
Set (Highlights, TD, HL);
end;
Token := LAL.Next (Token);
end loop;

Easy, right? Once this code has run, we already have a pretty decent highlighting for our analysis unit! The second pass is just a refinement that uses syntax as described at the top of this blog post:

function Syntax_Highlight
(Node : access LAL.Ada_Node_Type'Class) return LAL.Visit_Status;

LAL.Traverse (LAL.Root (Unit), Syntax_Highlight'Access);

LAL.Traverse will traverse Unit’s syntax tree (AST) and call the Syntax_Highlight function on each node. This function is a big dispatcher on the kind of the visited node:

function Syntax_Highlight
(Node : access LAL.Ada_Node_Type'Class) return LAL.Visit_Status
is
procedure Highlight_Block_Name
(Name : access LAL.Name_Type'Class) is
begin
Highlight_Name (Name, Block_Name, Highlights);
end Highlight_Block_Name;
begin
case Node.Kind is
declare
Subp_Spec : constant LAL.Subp_Spec :=
LAL.Subp_Spec (Node);

Params : constant LAL.Param_Spec_Array_Access :=
Subp_Spec.P_Node_Params;

begin
Highlight_Block_Name
(Subp_Spec.F_Subp_Name, Highlights);
Highlight_Type_Expr
(Subp_Spec.F_Subp_Returns, Highlights);
for Param of Params.Items loop
Highlight_Type_Expr
(Param.F_Type_Expr, Highlights);
end loop;
end;

Highlight_Block_Name
(LAL.Subp_Body (Node).F_End_Id, Highlights);

Set (Highlights,
LAL.Data (Node.Token_Start),
Keyword_Type);
Highlight_Block_Name
(LAL.Type_Decl (Node).F_Type_Id, Highlights);

Highlight_Block_Name
(LAL.Subtype_Decl (Node).F_Type_Id, Highlights);

--  ...
end case;
return LAL.Into;
end Syntax_Highlight;

Depending on the nature of the AST node to process, we apply specific syntax highlighting rules. For example, the first one above: for subprogram specifications (Subp_Spec), we highlight the name of the subprogram as a “block name” while we highlight type expressions for the return type and the type of all parameters as “type expressions”. Let’s go deeper: how do we highlight names?

procedure Highlight_Name
(Name       : access LAL.Name_Type'Class;
HL         : Highlight_Type;
Highlights : in out Highlights_Holder) is
begin
if Name = null then
return;
end if;

case Name.Kind is
--  Highlight the only token that this node has
declare
Tok : constant LAL.Token_Type :=
LAL.Single_Tok_Node (Name).F_Tok;
begin
Set (Highlights, LAL.Data (Tok), HL);
end;

--  Highlight both the prefix, the suffix and the
--  dot token.

declare
Dotted_Name : constant LAL.Dotted_Name :=
LAL.Dotted_Name (Name);
Dot_Token   : constant LAL.Token_Type :=
LAL.Next (Dotted_Name.F_Prefix.Token_End);
begin
Highlight_Name
(Dotted_Name.F_Prefix, HL, Highlights);
Set (Highlights, LAL.Data (Dot_Token), HL);
Highlight_Name
(Dotted_Name.F_Suffix, HL, Highlights);
end;

--  Just highlight the name of the called entity
Highlight_Name
(LAL.Call_Expr (Name).F_Name, HL, Highlights);

when others =>
return;
end case;
end Highlight_Name;

The above may be quite long, but what it does isn’t new: just as in the Syntax_Highlight function, we execute various actions depending on the kind of the input AST node. If it’s a mere identifier, then we just have to highlight the corresponding only token. If it’s a dotted name (X.Y in Ada), we highlight the prefix (X), the suffix (Y) and the dot in between as names. And so on.

At this point, we could create other syntactic highlighting rules for remaining AST nodes. This blog post is already quite long, so we’ll stop there.

There is one piece that is missing before our syntax highlighter can become actually useful: output formatted source code. Let’s output HTML, as this format is easy to produce and quite universal. We start with a helper analogous to the previous Set procedure, to deal with the dual token/trivia streams:

function Get
(Highlights : Highlights_Holder;
Token      : LAL.Token_Data_Type) return Highlight_Type
is
Index : constant Token_Index := LAL.Index (Token);
begin
return (if LAL.Is_Trivia (Token)
then Highlights.Trivia_Highlights (Index)
else Highlights.Token_Highlights (Index));
end Get;

And now let’s get to the output itself. This starts with a simple iteration on token, so the outline is similar to the first highlighting pass we did above:

Token     : LAL.Token_Type := LAL.First_Token (Unit);
Last_Sloc : Slocs.Source_Location := (1, 1);

Put_Line ("<pre>");
while Token /= LAL.No_Token loop
declare
TD         : constant LAL.Token_Data_Type :=
LAL.Data (Token);
HL         : constant Highlight_Type :=
Get (Highlights, TD);
Sloc_Range : constant Slocs.Source_Location_Range :=
LAL.Sloc_Range (TD);

Text : constant Langkit_Support.Text.Text_Type :=
LAL.Text (Token);
begin
while Last_Sloc.Line < Sloc_Range.Start_Line loop
New_Line;
Last_Sloc.Line := Last_Sloc.Line + 1;
Last_Sloc.Column := 1;
end loop;

if Sloc_Range.Start_Column > Last_Sloc.Column then
Indent (Integer (Sloc_Range.Start_Column - Last_Sloc.Column));
end if;

Put_Token (Text, HL);
Last_Sloc := Slocs.End_Sloc (Sloc_Range);
end;
Token := LAL.Next (Token);
end loop;
Put_Line ("</pre>");

The tricky part here is that tokens alone are not enough: we use the source location information (line and column numbers) associated to tokens in order to re-create line breaks and whitespaces: this is what the inner while loop and if statement do. As usual, we delegate “low-level” actions to dedicated procedures:

procedure Put_Token
(Text : Langkit_Support.Text.Text_Type;
HL   : Highlighter.Highlight_Type) is
begin
Put ("<span style=""color: #" & Get_HTML_Color (HL)
& ";"">");
Put (Escape (Text));
Put ("</span>");
end Put_Token;

procedure New_Line is
begin
Put ((1 => ASCII.LF));
end New_Line;

procedure Indent (Length : Natural) is
begin
Put ((1 .. Length => ' '));
end Indent;

Writing the Escape function, which wraps special HTML characters such as < or > into HTML entities (< and cie), and Get_HTML_Color, which returns a suitable hexadecimal string to encode the color corresponding to a highlighting category (for instance: #ff0000, i.e. the red color, for keywords) is left as an exercise to the reader.

Note that Escape must deal with a Text_Type formal. This type, which is really a subtype of Wide_Wide_String, is used to encode source excerpts in a uniform way in Libadalang, regardless of the input encoding. In order to do something useful with them, one must transcode it to UTF-8, for example. One way to do this is to use GNATCOLL.Iconv, but this is out of the scope of this post.

So here we are! Now you know how to:

• iterate on the stream of tokens/trivia in the resulting analysis unit, as well as process the associated data;

• traverse the syntax tree of this unit;

• combine the above in order to create a syntax highlighter.

Thank you for reading this post to the end! If you are interested in pursuing this road, you can find a compilable set of sources for this syntax highlighter on Libadalang’s repository on Github. And because we cannot decently dedicate a whole blog post to a syntax highlighter without a little demo, here is one:

]]>

When things don’t work as expected, developers usually do one of two things: either add debug prints to their programs, or run their programs under a debugger. Today we’ll focus on the latter activity.

Debuggers are fantastic tools. They turn our compiled programs from black boxes into glass ones: you can interrupt your program at any point during its execution, see where it stopped in the source code, inspect the value of your variables (even some specific array item), follow chains of accesses, and even modify all these values live. How powerful! However, sometimes there’s so much information available that navigating through it to reach the bit of state you want to inspect is just too complex.

Take a complex container, such as an ordered map from Ada.Containers.Ordered_Maps, for example. These are implemented in GNAT as  binary trees: collections of nodes, each node having a link to its parent and its left and children a key and a value. Unfortunately, finding a particular node in a debugger that corresponds to the key you are looking for is a painful task. See for yourself:

with Ada.Containers.Ordered_Maps;

procedure PP is
package Int_To_Nat is

Map : Int_To_Nat.Map;
begin
for I in 0 .. 9 loop
Map.Insert (I, 10 * I);
end loop;

Map.Clear;  --  BREAK HERE
end PP;

Build this program with debug information and execute it until line 13:

$gnatmake -q -g pp.adb$ gdb -q ./pp
Breakpoint 1 at 0x406a81: file pp.adb, line 13.
(gdb) r
Breakpoint 1, pp () at pp.adb:13
13         Map.Clear;  --  BREAK HERE

(gdb) print map
$1 = (tree => (first => 0x64e010, last => 0x64e1c0, root => 0x64e0a0, length => 10, tc => (busy => 0, lock => 0))) # “map” is a record that contains a bunch of accesses… # not very helpful. We need to go deeper. (gdb) print map.tree.first.all$2 = (parent => 0x64e040, left => 0x0, right => 0x0, color => black, key => 0, element => 0)

# Ok, so what we just saw above is the representation of the node
# that holds the key/value association for key 0 and value 0. This
# first node has no child (see left and right above), so we need to
# inspect its parent:
(gdb) print map.tree.first.parent.all
$3 = (parent => 0x64e0a0, left => 0x64e010, right => 0x64e070, color => black, key => 1, element => 10) # Great, we havethe second element! It has a left child, # which is our first node ($2). Now let’s go to its right
# child:
(gdb) print map.tree.first.parent.right.all
$4 = (parent => 0x64e040, left => 0x0, right => 0x0, color => black, key => 2, element => 20) # That was the third element: this one has no left or right # child, so we have to get to the parent of$3:
(gdb) print map.tree.first.parent.parent.all
$5 = (parent => 0x0, left => 0x64e040, right => 0x64e100, color => black, key => 3, element => 30) # So far, so good: we already visited the left child ($4), so
# now we need to visit $5’s right child: (gdb) print map.tree.first.parent.parent.right.all$6 = (parent => 0x64e0a0, left => 0x64e0d0, right => 0x64e160, color => black, key => 5, element => 50)

# Key 5? Where’s the node for the key 4? Oh wait, we should
# also visit the left child of $6: (gdb) print map.tree.first.parent.parent.right.left.all$7 = (parent => 0x64e100, left => 0x0, right => 0x0, color => black, key => 4, element => 40)

# Ad nauseam…

Manually visiting a binary tree is much easier for computers than it is for humans, as everyone knows. So in this case, it seems easier to write a debug procedure in Ada that iterates on the container and prints each key/value association and call this debug procedure from GDB.

But this has its own drawbacks: first, it forces you to remember to write this procedure for each container instantiation you do, eventually forcing you to rebuild your program each time you debug it. But there’s worse: if you debug your program from a core dump, it’s not possible to call a debug procedure from GDB. Besides, if the state of your program is somehow corrupted, due to stack overflow or a subtle memory handling bug (dangling pointers, etc.), calling this debug procedure will probably corrupt your process even more, making the debugging session a nightmare!

This is where GDB comes to the rescue: there’s a feature called pretty-printers, which makes it possible to hook into GDB to customize how it displays values. For example, you can write a hook that intercepts values whose type matches the instantiation of ordered maps and that displays only the “useful” content of the maps.

We developed several GDB scripts to implement such pretty-printers for the most common standard containers in Ada: not only vectors, hashed/ordered maps/sets, linked lists, but also unbounded strings. You can find them in the dedicated repository hosted on GitHub: https://github.com/AdaCore/gnat-gdb-scripts.

With these scripts properly installed, inspecting the content of containers becomes much easier:

(gdb) print map

#### Step 2: Construct the Suffix Tree for the string of hash codes

The algorithm by Ukkonen is quite subtle, but it was easy to translate an existing C implementation into Python. For those curious enough, a very instructive series of 6 blog posts leading to this implementation describes Ukkonen's algorithm in details.

#### Step 3: Compute the longest repeated substring in the string of hash codes

For that, we look at the internal node of the Suffix Tree constructed above with the greatest height (computed in number of hashes). Indeed, this internal node corresponds to two or more suffixes that share a common prefix. For example, with string "adacore", there is a single internal node, which corresponds to the common prefix "a" for suffixes "adacore" and "acore", after which the suffixes are different. The children of this internal node in the Suffix Tree contain the information of where the suffixes start in the string (position 0 for "adacore" and 2 for "acore"), so we can compute positions in the string of hash codes where hashes are identical and for how many hash codes. Then, we can translate this information into files, lines of code and number of lines.

The steps above allow to detect only the longest copy-paste across a codebase (in terms of number of hash codes, which may be different from number of lines of code). Initially, we did not find a better way to detect all copy-pastes longer than a certain limit than to repeat steps 2 and 3 after we remove from the string of hash codes those that correspond to the copy-paste previously detected. This algorithm ran in about one hour on the full codebase of GPS, consisting in 350 ksloc (as counted by sloccount), and it reported both very valuable copy-pastes of more than 100 lines of code, as well as spurious ones. To be clear, the spurious ones were not bugs in the implementation, but limitations of the algorithm that captured "copy-pastes" that were valid duplications of similar lines of code. Then we improved it.

# Improvements: Finer-Grain Encoding and Collapsing

The imprecisions of our initial algorithm came mostly from two sources: it was sometimes ignoring too much of the source code, and sometimes too little. That was the case in particular for the abstraction of all identifiers as the wildcard character \$, which led to spurious copy-pastes where the identifiers were semantically meaningful and could not be replaced by any other identifier. We fixed that by distinguishing local identifiers that are abstracted away from global identifiers (from other units) that are preserved, and by preserving all identifiers that could be the names of record components (that is, used in a dot notation like Obj.Component). Another example of too much abstraction was that we abstracted all literals by their kind, which again lead to spurious copy-pastes (think of large aggregates defining the value of constants). We fixed that by preserving the value of literals.

As an example of too little abstraction, we got copy-pastes that consisted mostly of sequences of small 5-to-10 lines subprograms, which could not be refactored usefully to share common code. We fixed that by collapsing sequences of such subprograms into a single hash code, so that their relative importance towards finding large copy-pastes was reduced. We made various other adjustments to the encoding function to modulate the importance of various syntactic constructs, simply by producing more or less hash codes for a given construct. An interesting adjustment consisted in ignoring the closing tokens in a construct (like the "end Proc;" at the end of a procedure) to avoid having copy-pastes that start on such meaningless starting points. It seems to be a typical default of token-based approaches, that our hash-based approach allows to solve easily, by simply not producing a hash for such tokens.

After these various improvements, the analysis of GPS codebase came down to 2 minutes, an impressive improvement from the initial one hour! The code for this version of the copy-paste detector can be found in the GitHub repository of Libadalang.

# Optimizations: Suffix Arrays, Single Pass

To improve on the above running time, we looked for alternative algorithms for performing the same task. And we found one! Suffix Arrays are an alternative to Suffix Trees, which is simpler to implement and from which we saw that we could generate all copy-pastes without regenerating the underlying data structure after finding a given copy-paste. We implemented in Python the algorithm in C++ found in this paper, and the code for this alternative implementation can be found in the GitHub repository of Libadalang. This version found the same copy-pastes as the previous one, as expected, with a running time of 1 minute for the analysis of GPS codebase, a 50% improvement!

Looking more closely at the bridges between Suffix Trees and Suffix Arrays (essentially you can reconstruct one from the other), we also realized that we could use the same one-pass algorithm to detect copy-pastes with Suffix Trees, instead of recreating each time the Suffix Tree for the text where the copy-paste just detected had been removed. The idea is that, instead of repeatedly detecting the longest copy-paste on a newly created Suffix Tree, we traverse the initial Suffix Tree and issue all copy-pastes with a maximal length, where copy-pastes that are not maximal can be easily recognized by checking the previous hash in the candidate suffixes. For example, if two suffixes for this copy-paste start at indexes 5 and 10 in the string of hashes, we check the hashes at indexes 4 and 9: if they are the same, then the copy-paste is not maximal and we do not report it. With this change, the running time for our original algorithm is just above 1 minute for the analysis of GPS codebase, i.e. close to the alternative implementation based on Suffix Arrays.

So we ended up with two implementations for our copy-paste detector, one based on Suffix Trees and one based on Suffix Arrays. We'll need to experiment further to decide which one to keep in a future plug-in for our GPS and GNATbench IDEs.

# Results on GPS

The largest source base on which we tried this tool is our IDE GNAT Programming Studio (GPS). This is about 350'000 lines of source code. It uses object orientation, tends to have medium-sized subprograms (20 to 30 lines), although there are some much longer ones. In fact, we aim at reducing the size of the longest subprograms, and a tool like gnatmetric will help find them. We are happy to report that most of the code duplication occurred in recent code, as we are transitioning and rewriting some of the old modules.

Nonetheless, the tool helped detect a number of duplicate chunks, with very few spurious detections (corresponding to cases where the tool reports a copy-paste that turns out to be only similar code).

Let's take a look at three copy-pastes that were detected.

#### Example 1: Intended temporary duplication of code

gps/gvd/src/debugger-base_gdb-gdb_cli.adb:3267:1: copy-paste of 166 lines detected with code from line 3357 to line 3522 in file gps/gvd/src/debugger-base_gdb-gdb_mi.adb

This is a large subprogram used to handle the Memory view in GPS. We have recently started changing the code to use the gdb MI protocol to communicate with gdb, rather than simulate an interactive session. Since the intent is to remove the old code, the duplication is not so bad, but is useful in reminding us we need to clean things up here, preferably soon before the code diverges too much.

#### Example 2: Unintended almost duplication of code

gps/builder/core/src/commands-builder-scripts.adb:266:1: copy-paste of 21 lines detected with code from line 289 to line 309

This code is in the handling of the python functions GPS.File.compile() and GPS.File.make(). Interestingly enough, these two functions were not doing the same thing initially, and are also documented differently (make attempts to link the file after compiling it). Yet the code is almost exactly the same, except that GPS does not spawn the same build target (see comment in the code below). So we could definitely use an if-expression here to avoid the duplication of the code.

      elsif Command = "compile" then
Info := Get_Data (Nth_Arg (Data, 1, Get_File_Class (Kernel)));
Extra_Args := GNAT.OS_Lib.Argument_String_To_List
(Nth_Arg (Data, 2, ""));

Builder := Builder_Context
(Kernel.Module (Builder_Context_Record'Tag));

Launch_Target (Builder      => Builder,
Target_Name  => Compile_File_Target,    -- <<< use Build_File_Target here for "make"
Mode_Name    => "",
Force_File   => Info,
Extra_Args   => Extra_Args,
Quiet        => False,
Synchronous  => True,
Dialog       => Default,
Background   => False,
Main_Project => No_Project,
Main         => No_File);

Free (Extra_Args);

The tool could be slightly more helpful here by highlighting the exact differences between the two blocks. As the blocks get longer, it is harder to spot a change in one identifier (as is the case here). This is where an integration in our IDEs like GPS and GNATbench would be useful, and of course possibly with some support for automatic refactoring of the code, also based on Libadalang.

#### Example 3: Unintended exact duplication of code

gps/code_analysis/src/codepeer-race_details_models.adb:39:1: copy-paste of 20 lines detected with code from line 41 to line 60 in file gps/code_analysis/src/codepeer-race_summary_models.adb

This one is an exact duplication of a function. The tool could perhaps be slightly more helpful by showing those exact duplicates first, since they will often be the easiest ones to remove, simply by moving the function to the spec.

   function From_Iter (Iter : Gtk.Tree_Model.Gtk_Tree_Iter) return Natural is
pragma Warnings (Off);
function To_Integer is
pragma Warnings (On);

begin
if Iter = Gtk.Tree_Model.Null_Iter then
return 0;

else
end if;
end From_Iter;

# Setup Recipe

So you actually want to try the above scripts on your own codebase? This is possible right now with your latest GNAT Pro release or the latest GPL release for community & academic users! Just follow the instructions we described in the Libadalang repository, you will then be able to run the scripts inside your favorite Python2 interpreter.

# Conclusion

What we took from this experiment was that: (1) it is easier than you think to develop a copy-paste detector for your favorite language, and (2) technology like Libadalang is key to facilitate the necessary experiments that lead to an efficient and effective detector. On the algorithmic side, we think it's very beneficial to use a string of hash codes as intermediate representation, as this allows to precisely weigh in which language constructs contribute what.

Interestingly, we did not find other tools or articles describing this type of intermediate approach between token-based approaches and syntactic approaches, which provides an even faster analysis than token-based approaches, while avoiding their typical pitfalls, and allows fine-grained control based on the syntactic structure, without suffering from the long running time typical of syntactic approaches.

We look forward to integrating our copy-paste detector in GPS and GNATbench, obviously initially for Ada, but possibly for other languages as well (for example C and Python) as progress on langkit, the Libadalang's underlying technology, allows. The integration of Libadalang in GPS was completed not long ago, so it's easier than ever.

]]>
VerifyThis Challenge in SPARK https://blog.adacore.com/verifythis-challenge-in-spark Fri, 28 Apr 2017 04:00:00 +0000 Yannick Moy https://blog.adacore.com/verifythis-challenge-in-spark

This year again, the VerifyThis competition took place as part of ETAPS conferences. This is the occasion for builders and users of formal program verification platforms to use their favorite tools on common challenges. The first challenge this year was a good fit for SPARK, as it revolves around proving properties of an imperative sorting procedure.

I am going to use this challenge to show how one can reach different levels of software assurance with SPARK. I'm referring here to the five levels of software assurance that we have used in our guidance document with Thales:

• Stone level - valid SPARK
• Bronze level - initialization and correct data flow
• Silver level - absence of run-time errors (AoRTE)
• Gold level - proof of key integrity properties
• Platinum level - full functional proof of requirements

## Stone level - valid SPARK

We start with a simple translation in Ada of the simplified variant of pair insertion sort given in page 2 of the challenge sheet:

package Pair_Insertion_Sort with
SPARK_Mode
is
subtype Index is Integer range 0 .. Integer'Last-1;
type Arr is array (Index range <>) of Integer
with Predicate => Arr'First = 0;

procedure Sort (A : in out Arr);

end Pair_Insertion_Sort;

package body Pair_Insertion_Sort with
SPARK_Mode
is
procedure Sort (A : in out Arr) is
I, J, X, Y, Z : Integer;
begin
I := 0;
while I < A'Length-1 loop
X := A(I);
Y := A(I+1);
if X < Y then
Z := X;
X := Y;
Y := Z;
end if;

J := I - 1;
while J >= 0 and then A(J) > X loop
A(J+2) := A(J);
J := J - 1;
end loop;
A(J+2) := X;

while J >= 0 and then A(J) > Y loop
A(J+1) := A(J);
J := J - 1;
end loop;
A(J+1) := Y;
I := I+2;
end loop;

if I = A'Length-1 then
Y := A(I);
J := I - 1;
while J >= 0 and then A(J) > Y loop
A(J+1) := A(J);
J := J - 1;
end loop;
A(J+1) := Y;
end if;
end Sort;

end Pair_Insertion_Sort;

Stone level is reached immediately on this code, as it is in the SPARK subset of Ada.

## Bronze level - initialization and correct data flow

Bronze level is also reached immediately, although the flow analysis in SPARK detected a problem the first time I ran it:

pair_insertion_sort.adb:35:39: medium: "Y" might not be initialized
pair_insertion_sort.adb:39:20: medium: "Y" might not be initialized


The problem was that the line initializing Y to A(I) near the end of the program was missing from my initial version. As I copied the algorithm in pseudo-code from the PDF and removed the comments in the badly formatted result, I also removed that line of code! After restoring that line, flow analysis did not complain anymore, I had reached Bronze level.

## Silver level - absence of run-time errors (AoRTE)

Then comes Silver level, which was suggested as an initial goal in the challenge: proving absence of runtime errors. As the main reason for run-time errors in this code is the possibility of indexing outside of array bounds, we need to provide bounds for the values of variables I and J used to index A. As these accesses are performed inside loops, we need to do so in loop invariants. Exactly 5 loop invariants are needed here, and with these GNATprove can prove the absence of run-time errors in the code in 7 seconds on my machine:

package body Pair_Insertion_Sort with
SPARK_Mode
is
procedure Sort (A : in out Arr) is
I, J, X, Y, Z : Integer;
begin
I := 0;
while I < A'Length-1 loop
X := A(I);
Y := A(I+1);
if X < Y then
Z := X;
X := Y;
Y := Z;
end if;

J := I - 1;
while J >= 0 and then A(J) > X loop
A(J+2) := A(J);
pragma Loop_Invariant (J in 0 .. A'Length-3);
J := J - 1;
end loop;
A(J+2) := X;

while J >= 0 and then A(J) > Y loop
A(J+1) := A(J);
pragma Loop_Invariant (J in 0 .. A'Length-3);
J := J - 1;
end loop;
A(J+1) := Y;

pragma Loop_Invariant (I in 0 .. A'Length-2);
pragma Loop_Invariant (J in -1 .. A'Length-3);
I := I+2;
end loop;

if I = A'Length-1 then
Y := A(I);
J := I - 1;
while J >= 0 and then A(J) > Y loop
A(J+1) := A(J);
pragma Loop_Invariant (J in 0 .. A'Length-2);
J := J - 1;
end loop;
A(J+1) := Y;
end if;
end Sort;

end Pair_Insertion_Sort;


## Gold level - proof of key integrity properties

Then comes Gold level, which is the first task of the verification challenge: proving that A is sorted on return. This can be expressed easily with a ghost function Sorted:

 function Sorted (A : Arr; I, J : Integer) return Boolean is
(for all K in I .. J-1 => A(K) <= A(K+1))
with Ghost,
Pre => J > Integer'First
and then (if I <= J then I in A'Range and J in A'Range);

procedure Sort (A : in out Arr) with
Post => Sorted (A, 0, A'Length-1);

Then, we need to augment the previous loop invariants with enough information to prove this sorting property. Let's look at the first (inner) loop. The invariant of this loop is that it maintains the array sorted up to the current high bound I+1, except for a hole of one index at J+1. That's easily expressed with the ghost function Sorted:

            pragma Loop_Invariant (Sorted (A, 0, J));
pragma Loop_Invariant (Sorted (A, J+2, I+1));

For these loop invariants to be proved, we need to know that the next value possibly passed across the hole at the next iteration (from index J-1 to J+1) is lower than the value currently at J+2. Well, we know that the value at index J-1 is lower than the value at index J thanks to the property Sorted(A,0,J). All we need to add is that A(J+2)=A(J):

 pragma Loop_Invariant (A(J+2) = A(J));

This proves the loop invariant. Next, after the loop, value X is inserted at index J+2 in the array. For the sorting property to hold from J+2 upwards after the loop, X needs to be less than the value at J+2 in the loop invariant. As A(J+2) and A(J) are equal, we can write:

 pragma Loop_Invariant (A(J) > X);

All the other loops are similar. With these additional loop invariants, GNATprove can prove the sorting property in the code in 12 seconds on my machine:

package body Pair_Insertion_Sort with
SPARK_Mode
is
procedure Sort (A : in out Arr) is
I, J, X, Y, Z : Integer;
begin
I := 0;
while I < A'Length-1 loop
X := A(I);
Y := A(I+1);
if X < Y then
Z := X;
X := Y;
Y := Z;
end if;

J := I - 1;
while J >= 0 and then A(J) > X loop
A(J+2) := A(J);
--  loop invariant for absence of run-time errors
pragma Loop_Invariant (J in 0 .. A'Length-3);
--  loop invariant for sorting
pragma Loop_Invariant (Sorted (A, 0, J));
pragma Loop_Invariant (Sorted (A, J+2, I+1));
pragma Loop_Invariant (A(J+2) = A(J));
pragma Loop_Invariant (A(J) > X);
J := J - 1;
end loop;
A(J+2) := X;

while J >= 0 and then A(J) > Y loop
A(J+1) := A(J);
--  loop invariant for absence of run-time errors
pragma Loop_Invariant (J in 0 .. A'Length-3);
--  loop invariant for sorting
pragma Loop_Invariant (Sorted (A, 0, J));
pragma Loop_Invariant (Sorted (A, J+1, I+1));
pragma Loop_Invariant (A(J+1) = A(J));
pragma Loop_Invariant (A(J) > Y);
J := J - 1;
end loop;
A(J+1) := Y;

--  loop invariant for absence of run-time errors
pragma Loop_Invariant (I in 0 .. A'Length-2);
--  loop invariant for sorting
pragma Loop_Invariant (J in -1 .. A'Length-3);
pragma Loop_Invariant (Sorted (A, 0, I+1));
I := I+2;
end loop;

if I = A'Length-1 then
Y := A(I);
J := I - 1;
while J >= 0 and then A(J) > Y loop
A(J+1) := A(J);
--  loop invariant for absence of run-time errors
pragma Loop_Invariant (J in 0 .. A'Length-2);
--  loop invariant for sorting
pragma Loop_Invariant (Sorted (A, 0, J));
pragma Loop_Invariant (Sorted (A, J+1, A'Length-1));
pragma Loop_Invariant (A(J+1) = A(J));
pragma Loop_Invariant (A(J) > Y);
J := J - 1;
end loop;
A(J+1) := Y;
end if;
end Sort;

end Pair_Insertion_Sort;


## Platinum level - full functional proof of requirements

Then comes Platinum level, which is the second task of the verification challenge: proving that A on return is a permutation of its value on entry. For that, we are going to use the ghost function Is_Perm that my colleague Claire Dross presented in this blog post, that expresses that the number of occurrences of any integer in arrays A and B coincide (that is, they represent the same multisets, which is a way to express that they are permutations of one another):

function Is_Perm (A, B : Arr) return Boolean is
(for all E in Integer => Occ (A, E) = Occ (B, E));


and the procedure Swap that she presented in the same blog post, for which I simply give the contract here:

  procedure Swap (Values : in out Arr;
X      : in     Index;
Y      : in     Index)
with
Pre  => X in Values'Range
and then Y in Values'Range
and then X /= Y,
Post => Is_Perm (Values'Old, Values)
and then Values (X) = Values'Old (Y)
and then Values (Y) = Values'Old (X)
and then (for all Z in Values'Range =>
(if Z /= X and Z /= Y then Values (Z) = Values'Old (Z)))


The only changes I made with respect to her initial version were to change in various places Natural for Integer, as arrays in our challenge store integers instead of natural numbers. In order to use the same strategy that she showed for selection sort on our pair insertion sort, we need to rewrite a bit the algorithm to swap array cells (as suggested in the challenge specification). For example, instead of the assignment in the first (inner) loop:

  A(J+2) := A(J);

we now have:

 Swap (A, J+2, J);

This has an effect on the loop invariants seen so far, which must be slightly modified. Then, we need to express that every loop maintains in A a permutation of the entry value for A. For that, we create a ghost constant B that stores the entry value of A:

B : constant Arr(A'Range) := A with Ghost;


and use this constant in loop invariants of the form:

pragma Loop_Invariant (Is_Perm (B, A));


We also need to express that the values X and Y that are pushed down the array are indeed the ones found around index J in all loops. For example in the first (inner) loop:

pragma Loop_Invariant ((A(J) = X and A(J+1) = Y) or (A(J) = Y and A(J+1) = X));

With these changes, GNATprove can prove the permutation property in the code in 27 seconds on my machine (including all the ghost code copied from Claire's blog post):

   procedure Sort (A : in out Arr) is
I, J, X, Y, Z : Integer;
B : constant Arr(A'Range) := A with Ghost;
begin
I := 0;
while I < A'Length-1 loop
X := A(I);
Y := A(I+1);
if X < Y then
Z := X;
X := Y;
Y := Z;
end if;

J := I - 1;
while J >= 0 and then A(J) > X loop
Swap (A, J+2, J);
--  loop invariant for absence of run-time errors
pragma Loop_Invariant (J in 0 .. A'Length-3);
--  loop invariant for sorting
pragma Loop_Invariant (Sorted (A, 0, J-1));
pragma Loop_Invariant (Sorted (A, J+2, I+1));
pragma Loop_Invariant (if J > 0 then A(J+2) >= A(J-1));
pragma Loop_Invariant (A(J+2) > X);
--  loop invariant for permutation
pragma Loop_Invariant (Is_Perm (B, A));
pragma Loop_Invariant ((A(J) = X and A(J+1) = Y) or (A(J) = Y and A(J+1) = X));
J := J - 1;
end loop;
if A(J+2) /= X then
Swap (A, J+2, J+1);
end if;

while J >= 0 and then A(J) > Y loop
Swap (A, J+1, J);
--  loop invariant for absence of run-time errors
pragma Loop_Invariant (J in 0 .. A'Length-3);
--  loop invariant for sorting
pragma Loop_Invariant (Sorted (A, 0, J-1));
pragma Loop_Invariant (Sorted (A, J+1, I+1));
pragma Loop_Invariant (if J > 0 then A(J+1) >= A(J-1));
pragma Loop_Invariant (A(J+1) > Y);
--  loop invariant for permutation
pragma Loop_Invariant (Is_Perm (B, A));
pragma Loop_Invariant (A(J) = Y);
J := J - 1;
end loop;

--  loop invariant for absence of run-time errors
pragma Loop_Invariant (I in 0 .. A'Length-2);
--  loop invariant for sorting
pragma Loop_Invariant (J in -1 .. A'Length-3);
pragma Loop_Invariant (Sorted (A, 0, I+1));
--  loop invariant for permutation
pragma Loop_Invariant (Is_Perm (B, A));
I := I+2;
end loop;

if I = A'Length-1 then
Y := A(I);
J := I - 1;
while J >= 0 and then A(J) > Y loop
Swap (A, J+1, J);
--  loop invariant for absence of run-time errors
pragma Loop_Invariant (J in 0 .. A'Length-2);
--  loop invariant for sorting
pragma Loop_Invariant (Sorted (A, 0, J-1));
pragma Loop_Invariant (Sorted (A, J+1, A'Length-1));
pragma Loop_Invariant (if J > 0 then A(J+1) >= A(J-1));
pragma Loop_Invariant (A(J+1) > Y);
--  loop invariant for permutation
pragma Loop_Invariant (Is_Perm (B, A));
pragma Loop_Invariant (A(J) = Y);
J := J - 1;
end loop;
end if;
end Sort;


The complete code for this challenge can be found on GitHub.

## Conclusions

Two features of SPARK were particularly useful to help debug unprovable properties. Counterexamples was the first, issuing messages such as the following:

pair_insertion_sort.adb:20:16: medium: array index check might fail (e.g. when A = (others => 1) and A'Last = 3 and J = 2)

Here it allowed me to realize that the upper bound I set for J was too high, so that J+2 was outside of A's bounds.

The second very useful feature was the ability to execute assertions and contracts, issuing messages such as the following:

raised SYSTEM.ASSERTIONS.ASSERT_FAILURE : Loop_Invariant failed at pair_insertion_sort.adb:51

Here it allowed me to realize that a loop invariant which I was trying to prove was in fact incorrect!

As this challenge shows, the five levels of software assurance in SPARK do not require the same level of effort at all. Stone and Bronze levels are rather easy to achieve on small codebases (although they might require refactoring on large codebases), Silver level requires some modest effort, Gold level requires more expertise to drive automatic provers, and finally Platinum level requires a much larger effort than all the previous levels (including possibly some rewriting of the algorithm to make automatic proof possible like here).

Thanks to the organisers of this year's VerifyThis competition, and congrats to the winning teams, two of which used Why3!

]]>
GPS for bare-metal developers https://blog.adacore.com/gps-for-bare-metal-development Wed, 19 Apr 2017 12:57:03 +0000 Anthony Leonardo Gracio https://blog.adacore.com/gps-for-bare-metal-development

In my previous blog article, I exposed some techniques that helped me rewrite the Crazyflie’s firmware from C into Ada and SPARK 2014, in order to improve its safety.

I was still an intern at that time and, back in the day, the support for bare-metal development in GPS was a bit minimalistic: daily activities like flashing and debugging my own firmware on the Crazyflie were a bit painful to do without having to go outside of GPS.

This is not the case anymore. GPS now comes with a number of features regarding bare-metal development that make it very easy for newcomers (as I was) to develop their own software for a particular board.

## Bare-metal Holy Grail: Build, Flash, Debug

Building your modified software in order to flash it or debug it on a board is a very common workflow in bare-metal development. GPS offers support for all these different steps, allowing you to perform them at once with one single click.

In particular, GPS now supports two different tools for connecting to your remote board in order to flash and/or debug it:

• ST-Link utility tools (namely st-util and st-flash) for STM32-based boards

• OpenOCD, a connection tool supporting various types of boards and probes, specifically the ones that use a JTAG interface

Once installed on your host, using these tools in order to flash or debug your project directly from GPS is very easy. As pictures are worth a thousand words, here is a little tutorial video showing how to set up your bare-metal project from GPS in order to build, flash and debug it on a board:

## Monitoring the memory usage

When it comes to bare-metal development, flashing and debugging your project on the targeted board is already a pretty advanced step: it means that you have already been able to compile and, above all, to link your software correctly.

The linking phase can be a real pain due to the limited memory resources of these boards: writing software that does not fit in the board’s available memory is something that can happen pretty quickly as long as your project grows and grows.

To address these potential issues, a Memory Usage view has been introduced. By default, this view is automatically spawned each time you build your executable and displays a complete view of the static memory usage consumed by your software, even when the linking phase has failed. The Memory Usage view uses a map file generated by the GNU ld linker to report the memory usage consumed at three different levels:

1. Memory regions, which correspond to the MEMORY blocks defined in the linker script used to link the executable
2. Memory sections (e.g: .data, .bss etc.)
3. Object files

Having a complete report of the memory usage consumed at each level makes it very convenient to identify which parts of your software are consuming too much memory for the available hardware ressources. This is the case in the example below, where we can see that the .bss section of our executable is just too big for the board's RAM, due to the huge uninitialized array declared in leds.ads.

## Conclusion

Bare-metal development support has been improved greatly in GPS, making it easier for newcomers to build, flash and debug their software. Moreover, the Memory Usage view allows the bare-metal developers to clearly identify the cause of memory overflows.

We don't want to stop our efforts regarding bare-metal development support so don't hesitate to submit ideas of improvements on our GitHub's repository!

]]>
User-friendly strings API https://blog.adacore.com/user-friendly-strings-api Mon, 10 Apr 2017 13:47:00 +0000 Emmanuel Briot https://blog.adacore.com/user-friendly-strings-api

# User friendly strings API

In a previous post, we described the design of a new strings package, with improved performance compared to the standard Ada unbounded strings implementation. That post focused on various programming techniques used to make that package as fast as possible.

This post is a followup, which describes the design of the user API for the strings package - showing various solutions that can be used in Ada to make user-friendly data structures.

## Tagged types

One of the features added in Ada 2005 is a prefix dot notation when calling primitive operations of tagged types. For instance, if you have the following declarations, you can call the subprograms Slice in one of two ways:

declare
type XString is tagged private;
procedure Slice (Self : in out XString; Low, High : Integer);   --  a primitive operation
S : XString;
begin
S.Slice (1, 2);      --  the two calls do the same thing, here using a prefix notation
Slice (S, 1, 2);
end;

This is a very minor change. But in practice, people tend to use the prefix notation because it is more "natural" for people who have read or written code in other programming languages.

In fact, it is so popular that there is some demand to extend this prefix notation, in some future version of the language, to types other than tagged types. Using a tagged type has a cost, since it makes variables slightly bigger (they now have a hidden access field).

In our case, though, a XString was already a tagged type since it is also controlled, so there is no additional cost.

## Indexing

The standard Ada strings are quite convenient to use (at least once you understand that you should use declare blocks since you need to know their size in advance). For instance, one can access characters with expressions like:

A : constant Character := S (2);      --  assuming 2 is a valid index
B : constant Character := S (S'First + 1);   --  better, of course
C : constant Character := S (S'Last);


The first line of the above example hides one of the difficulties for newcomers: strings can have any index range, so using just "2" is likely to be wrong here. Instead, the second line should be used. The GNATCOLL strings avoid this pitfall by always indexing from 1. As was explained in the first blog post, this is both needed for the code (so that internally we can reallocate strings as needed without changing the indexes manipulated by the user), and more intuitive for a lot of users.

The Ada unbounded string has a similar approach, and all strings are indexed starting at 1. But you can't use the same code as above, and instead you need to write the more cumbersome:

S := To_Unbounded_String (...);
A : constant Character := Element (S, 2);     --  second character of the string, always
B : constant Character := Ada.Strings.Unbounded.Element (S, 2);   --  when not using use-clauses

GNATCOLL.Strings takes advantage of some new aspects introduced in Ada 2012 to provide custom indexing functions. So the spec looks like:

type XString is tagged private
with Constant_Indexing  => Get;

function Get (Self : XString; Index : Positive) return Character;

S : XString := ...;
A : constant Character := S (2);    --   always the second character

After adding this simple Constant_Indexing aspect, we are now back to the simple syntax we were using for standard String, using parenthesis to point to a specific character. But here we also know that "2" is always the second character, so we do not need to use the 'First attribute.

## Variable indexing

There is a similar aspect, named Variable_Indexing which can be used to modify a specific character of the string, as we would do with a standard string. So we can write:

type XString is tagged private
with Variable_Indexing  => Reference;

type Character_Reference (Char : not null access Char_Type) is limited null record
with Implicit_Dereference => Char;

function Reference
(Self  : aliased in out XString;
Index : Positive) return Character_Reference;

S : XString := ...;
S (2) := 'A';
S (2).Char := 'A';    --  same as above, but not taking advantage of Implicit_Derefence


Although this is simple to use for users of the library, the implementation is actually much more complex than for Constant_Indexing

First of all, we need to introduce a new type, called a Reference Type (in our example this is the Character_Reference), which basically is a type with an access discriminant and the Implicit_Dereference aspect. This type acts as a safer replacement for an access type (for instance it can't be copied quite as easily). Through this type, we have an access to a specific character, and therefore users can replace that character easily.

But the real difficulty is exactly because of this access. Imaging that we assign the string to another one. At that point, they share the internal buffer when using the copy-on-write mode described in the first post. But if we then modify a character we modify it in both strings, although from the user's point of view these are two different instances ! Here are three examples of code that would fail:

begin
S2 := S;
S (2) := 'A';
--  Now both S2 and S have 'A' as their second character
end;

declare
R : Character_Reference := S.Reference (2);
begin
S2 := S;    --   now sharing the internal buffer
R := 'A';   -- Both S2 and S have 'A' as their second character
end;

declare
R : Character_Reference := S.Reference (2);
begin
S.Append ("ABCDEF");
R := 'A';   --  might point to deallocated memory
end;

Of course, we want our code to be safe, so the implementation needs to prevent the above errors. As soon we take a Reference to a character we must ensure the internal buffer is no longer shared. This fixes the first example above: Since the call to S (2) no longer shares the buffer, we are indeed only modifying S and not S2.

The second example looks similar, but in fact the sharing is done after we have taken the reference. So in fact when we take a Reference we also need to make the internal buffer unshareable. This is done by setting a special value for the refcounting, so that the assignment always make a copy of S's internal buffer, rather than share it.

There is a hidden cost to using Variable_Indexing, since we are no longer able to share the buffer as soon as a reference was taken. This is unfortunately unavoidable, and is one of those examples where convenience of use goes counter to performance...

The third example is much more complex. Here, we have a reference to a character in the string, but when we append data to the string we are likely to reallocate memory. The system could be allocating new memory, copying the data, and then freeing the old one. And our existing reference would point to the memory area that we have just freed !

I have not found a solution here (and C++ implementations seem to have similar limitations). We cannot simply prevent reallocation (i.e. changing the size of the string), since the user might simply have taken a reference, changed the character, and dropped the reference. At that point, reallocating would be safe. For now, we rely on documentation and on the fact that it is some extra work to preserve a reference as we did in the example, it is much more natural to save the index, and then take another reference when needed, as in:

declare
Saved : constant Integer := 2;
begin
S.Append ("ABCDEF");
S (Saved) := 'A';     --  safe
end;

## Iteration

Finally, we want to make it easy to traverse a string and read all its characters (and possibly modify them, of course). With the standard strings, one would do:

declare
S : String (1 .. 10) := ...;
begin
for Idx in S'Range loop
S (Idx) := 'A';
end loop;

for C of S loop
Put (C);
end loop
end;

declare
S : Unbounded_String := ....;
begin
for Idx in 1 .. Length (S) loop
Replace_Element (S, Idx, 'A');
Put (Element (S, Idx));
end loop;
end;

Obviously, the version with unbounded strings is a lot more verbose and less user-friendly. It is possible, since Ada 2012, to provide custom iteration for our own strings, via some predefined aspects. As the gem shows, this is a complex operation...

GNAT provides a much simpler aspect that fulfills all practical needs, namely the Iterable aspect. We use it as such:

type XString is tagged private
with Iterable => (First => First,
Next => Next,
Has_Element => Has_Element,
Get => Get);

function First (Self : XString) return Positive is (1);
function Next (Self : XString; Index : Positive) return Positive is (Index + 1);
function Has_Element (Self : XString; Index : Positive) return Boolean is (Index <= Self.Length);
function Get (Self : Xstring; Idx : Positive) return Character;   --   same one we used for indexing

These functions are especially simple here because we know the range of indexes. So basically they are declared as expression functions and made inline. This means that the loop can be very fast in practice. For consistency with the way the "for...of" loop is used with standard Ada containers, we chose to have Get return the character itself, rather than its index.  And now we can use:

S : XString := ...;

for C of S loop
Put (C);
end loop;

As explained, the for..of loop returns the character itself, which means we can't change the string directly. So we need to provide a second iterator that will return the indexes. We do this via a new small type that takes care of everything, as in:

 type Index_Range is record
Low, High : Natural;
end record
with Iterable => (First       => First,
Next        => Next,
Has_Element => Has_Element,
Element     => Element);

function First (Self : Index_Range) return Positive is (Self.Low);
function Next (Self : Index_Range; Index : Positive) return Positive  is (Index + 1);
function Has_Element (Self : Index_Range; Index : Positive)  return Boolean is (Index <= Self.High);
function Element
(Self : Index_Range; Index : Positive) return Positive
is (Index);

function Iterate (Self : XString) return Index_Range
is ((Low => 1, High => Self.Length));

S : XString := ...;

for Idx in S.Iterate loop
S (Idx) := 'A';
end loop;


The type Index_Range is very similar to the previous use of the Iterable aspect. We just need to introduce one extra primitive operation for the string, which is used to initiate the iteration.

In fact, we could make Iterate and Index_Range more complex, so that for instance we can iterate on every other character, or on every 'A' in the string, or any kind of iteration scheme. This would of course require a more complex Index_Range, but opens up nice possibilities.

## Slicing

With standard strings, one can get a slice by using notation like:

  declare
S : String (1 .. 10) := ...;
begin
Put (S (2 .. 5));
S (2 .. 5) := 'ABCD';
end;

Unfortunately, there is no possible equivalent for GNATCOLL strings, because the range notation ("..") cannot be redefined. We have to use code like:

declare
S : XString := ...;
begin
Put (S.Slice (2, 5));
S.Replace (2, 5, 'ABCD');
end;

It would be nice if Ada had a new aspect to let us map the ".." notation to a function, for instance something like:

type XString is tagged private
with  Range_Indexing  => Slice,           --   NOT Ada 2012
Variable_Range_Indexing => Replace;    --   NOT Ada 2012

function Slice (Self : XString; Low, High : Integer) return XString;
--  Low and High must be of same type, but that type could be anything
--  The return value could also be anything

procedure Replace (S : in out XString; Low, High : Integer; Replace_With : String);
--  Likewise, Low and High must be of same type, but this could be anything.
--  and Replace_With could be any type

Perhaps in some future version of the language?

## Conclusion

Ada 2012 provides quite a number of new aspects that can be applied to types, and make user API more convenient to use. Some of them are complex to use, and GNAT sometimes provides a simpler version.

]]>
GNATprove Tips and Tricks: Proving the Ghost Common Divisor (GCD) https://blog.adacore.com/gnatprove-tips-and-tricks-proving-the-ghost-common-denominator-gcd Thu, 06 Apr 2017 04:00:00 +0000 Yannick Moy https://blog.adacore.com/gnatprove-tips-and-tricks-proving-the-ghost-common-denominator-gcd

Euclid's algorithm for computing the greatest common divisor of two numbers is one of the first ones we learn in school (source: myself), and also one of the first algorithms that humans devised (source: WikiPedia). So it's quite appealing to try to prove it with an automatic proving toolset like SPARK. It turns out that proving it automatically is not so easy, just like understanding why it works is not so easy. To prove it, I am going to use a fair amount of ghost code, to convey the necessary information to GNATprove, the SPARK proof tool.

Let's start with the specification of what the algorithm should do: computing the greatest common divisor of two positive numbers. The algorithm can be described for natural numbers or even negative numbers, but we'll keep things simple by using positive inputs. Given that Euclid himself only considered positive numbers, that should be enough for us here. A common divisor C of two integers A and B should divide both of them, which we usually express efficiently in software as (A mod C = 0) and (B mod C = 0). This gives the following contract for GCD:

 function Divides (A, B : in Positive) return Boolean is (B mod A = 0) with Ghost;

function GCD (A, B : in Positive) return Positive with
Post => Divides (GCD'Result, A)
and then Divides (GCD'Result, B);


Note that we're already using some ghost code here: the function Divides is marked as Ghost because it is only used in specifications. The problem with the contract above is that it is satisfied by a very simple implementation that does not actually compute the GCD:

 function GCD (A, B : in Positive) return Positive is
begin
return 1;
end GCD;

And GNATprove manages to prove that easily. So we need to specify the GCD function more precisely, by stating that it returns not only a common divisor of A and B, but the greatest of them:

 function GCD (A, B : in Positive) return Positive with
Post => Divides (GCD'Result, A)
and then Divides (GCD'Result, B)
and then (for all X in GCD'Result + 1 .. Integer'Min (A, B) =>
not (Divides (X, A) and Divides (X, B)));


Here, I am relying on the property that the GCD is lower than both A and B, to only look for a greater divisor between the value returned (GCD'Result) and the minimum of A and B (Integer'Min (A, B)). That contract uniquely defines the GCD, and indeed GNATprove cannot prove that the previous incorrect implementation of GCD implements this contract.

Let's implement now the GCD, starting with a very simple linear search that looks for the greatest common divisor of A and B:

  function GCD (A, B : in Positive) return Positive is
C : Positive := Integer'Min (A, B);
begin
while C > 1 loop
exit when A mod C = 0 and B mod C = 0;
pragma Loop_Invariant (for all X in C .. Integer'Min (A, B) =>
not (Divides (X, A) and Divides (X, B)));
C := C - 1;
end loop;

return C;
end GCD;


We start at the minimum of A and B, and check if this value divides A and B. If so, we're done and we exit the loop. Otherwise we decrease this value by 1 and we continue. This is not efficient, but this computes the GCD as desired. In order to prove that implementation, we need a simple loop invariant that states that no common divisor of A and B has been encountered in the loop so far. GNATprove proves easily that this implementation implements the contract of GCD (with --level=2 -j0 in less than a second).

Can we increase the efficiency of the algorithm? If we had to do this computation by hand, we would try (Integer'Min (A, B) / 2) rather than (Integer'Min (A, B) - 1) as second possible value for the GCD, because we know that no value in between has a chance of dividing both A and B. Here is an implementation of that idea:

function GCD (A, B : in Positive) return Positive is
C : Positive := Integer'Min (A, B);
begin
if A mod C = 0 and B mod C = 0 then
return C;

else
C := C / 2;

while C > 1 loop
exit when A mod C = 0 and B mod C = 0;
pragma Loop_Invariant (for all X in C .. Integer'Min (A, B) =>
not (Divides (X, A) and Divides (X, B)));
C := C - 1;
end loop;

return C;
end if;
end GCD;

I kept the same loop invariant as before, because we need that loop invariant to prove the postcondition of GCD. But GNATprove does not manage to prove that this invariant holds the first time that it enters the loop:

math_simple_half.ads:9:20: medium: postcondition might fail, cannot prove not (Divides (X, A) and Divides (X, B)

Indeed, we know that the GCD cannot lie between (Integer'Min (A, B) / 2) and Integer'Min (A, B) in that case, but provers don't. We have to provide the necessary information for GNATprove to deduce this fact. This is where we are going to use ghost code more extensively. As we need to prove a quantified property, we are going to establish it progressively, though a loop, accumulating the knowledge gathered in a loop invariant:

      ...
C := C / 2;
for J in C + 1 .. Integer'Min (A, B) - 1 loop
pragma Loop_Invariant (for all X in C + 1 .. J =>
not Divides (X, Integer'Min (A, B)));
end loop;

while C > 1 loop
...

With the help of this ghost code, the loop invariant in our original loop is now fully proved! But the loop invariant of the ghost loop itself is not proved, for iterations beyond the initial one:

math_simple_half.adb:24:38: medium: loop invariant might fail after first iteration, cannot prove not Divides (X, Integer'min)


The property we need here is that, given a value J between (Integer'Min (A, B) / 2 + 1) and (Integer'Min (A, B) - 1), J cannot divide Integer'Min (A, B). We are going to establish this property in a separate ghost lemma, that we'll call inside the loop to get this property. The lemma is an abstraction of the property just stated:

  procedure Lemma_Not_Divisor (Arg1, Arg2 : Positive) with
Ghost,
Global => null,
Pre  => Arg1 in Arg2 / 2 + 1 .. Arg2 - 1,
Post => not Divides (Arg1, Arg2)
is
begin
null;
end Lemma_Not_Divisor;


Even better, by isolating the desired property from its context of use, GNATprove can now prove it! Here, it is the underlying prover Z3 which manages to prove the postcondition of Lemma_Not_Divisor. It remains to call this lemma inside the loop:

         ...
C := C / 2;
for J in C + 1 .. Integer'Min (A, B) - 1 loop
Lemma_Not_Divisor (J, Integer'Min (A, B));
pragma Loop_Invariant (for all X in C + 1 .. J =>
not Divides (X, Integer'Min (A, B)));
end loop;

while C > 1 loop
...


We're almost done. The only check that GNATprove does not manage to prove is the range check on the decrement to C in our original loop:

math_simple_half.adb:32:20: medium: range check might fail


It looks trivial though. The loop test states that (C > 1), so it should be easy to prove that C can be decremented and remain positive, right? Well, it helps here to understand how proof is carried on loops, by creating an artificial loop starting from the loop invariant in an iteration and ending in the same loop invariant at the next iteration. Because the loop invariant is not the first statement of the loop here, the loop test (C > 1) is not automatically added to the loop invariant (which would be incorrect in general, if the first statement modified C), so provers do not see this hypothesis when trying to prove the range check later. All that is needed here is to explicitly carry the loop test inside the loop invariant. Here is the final implementation of the more efficient GCD, which is proved easily by GNATprove (with --level=2 -j0 in less than two seconds):

 procedure Lemma_Not_Divisor (Arg1, Arg2 : Positive) with
Ghost,
Global => null,
Pre  => Arg1 in Arg2 / 2 + 1 .. Arg2 - 1,
Post => not Divides (Arg1, Arg2)
is
begin
null;
end Lemma_Not_Divisor;

function GCD (A, B : in Positive) return Positive is
C : Positive := Integer'Min (A, B);
begin
if A mod C = 0 and B mod C = 0 then
return C;

else
C := C / 2;
for J in C + 1 .. Integer'Min (A, B) - 1 loop
Lemma_Not_Divisor (J, Integer'Min (A, B));
pragma Loop_Invariant (for all X in C + 1 .. J =>
not Divides (X, Integer'Min (A, B)));
end loop;

while C > 1 loop
exit when A mod C = 0 and B mod C = 0;
pragma Loop_Invariant (C > 1);
pragma Loop_Invariant (for all X in C .. Integer'Min (A, B) =>
not (Divides (X, A) and Divides (X, B)));
C := C - 1;
end loop;

return C;
end if;
end GCD;


It's time to try to prove Euclid's variant of the algorithm, still with the same contract on GCD:

 function GCD (A, B : in Positive) return Positive is
An : Positive := A;
Bn : Natural := B;
C  : Positive;
begin
while Bn /= 0 loop
C  := An;
An := Bn;
Bn := C mod Bn;
end loop;

return An;
end GCD;


The key insight here to justify the large jumps when decreasing the value of C is that, at each iteration of the loop, any common divisor between A and B is still a common divisor of An and Bn. And conversely. So the greatest common divisor of A and B turns out to be also the greatest common divisor of An and Bn. Which is just An when Bn turns out to be zero. Here is the loop invariant that expresses that property:

         pragma Loop_Invariant
(for all X in Positive =>
(Divides (X, A) and Divides (X, B))
=
(Divides (X, An) and (Bn = 0 or else Divides (X, Bn))));


In order to prove this loop invariant, we need to state a ghost lemma like before, that helps prove the property needed at a particular iteration of the loop:

 procedure Lemma_Same_Divisor_Mod (A, B : Positive) with
Ghost,
Global => null,
Post => (for all X in Positive =>
(Divides (X, A) and Divides (X, B))
=
(Divides (X, B) and then (Divides (B, A) or else Divides (X, A mod B))))

Because this lemma is more complex, it is not proved automatically like the previous one. Here, we need to implement it with ghost code, extracting three further lemmas that help proving this one. Here is the complete code for this implementation:

procedure Lemma_Divisor_Mod (A, B, X : Positive) with
Ghost,
Global => null,
Pre  => Divides (X, A) and then Divides (X, B) and then not Divides (B, A),
Post => Divides (X, A mod B)
is
begin
null;
end Lemma_Divisor_Mod;

procedure Lemma_Divisor_Transitive (A, B, X : Positive) with
Ghost,
Global => null,
Pre  => Divides (B, A) and then Divides (X, B),
Post => Divides (X, A)
is
begin
null;
end Lemma_Divisor_Transitive;

procedure Lemma_Divisor_Mod_Inverse (A, B, X : Positive) with
Ghost,
Global => null,
Pre  => not Divides (B, A) and then Divides (X, A mod B) and then Divides (X, B),
Post => Divides (X, A)
is
begin
null;
end Lemma_Divisor_Mod_Inverse;

procedure Lemma_Same_Divisor_Mod (A, B : Positive) with
Ghost,
Global => null,
Post => (for all X in Positive =>
(Divides (X, A) and Divides (X, B))
=
(Divides (X, B) and then (Divides (B, A) or else Divides (X, A mod B))))
is
begin
for X in Positive loop
if Divides (X, A) and then Divides (X, B) and then not Divides (B, A) then
Lemma_Divisor_Mod (A, B, X);
end if;
if Divides (B, A) and then Divides (X, B) then
Lemma_Divisor_Transitive (A, B, X);
end if;
if not Divides (B, A) and then Divides (X, A mod B) and then Divides (X, B) then
Lemma_Divisor_Mod_Inverse (A, B, X);
end if;
pragma Loop_Invariant
(for all Y in 1 .. X =>
(Divides (Y, A) and Divides (Y, B))
=
(Divides (Y, B) and then (Divides (B, A) or else Divides (Y, A mod B))));
end loop;
end Lemma_Same_Divisor_Mod;

function GCD (A, B : in Positive) return Positive is
An : Positive := A;
Bn : Natural := B;
C  : Positive;
begin
while Bn /= 0 loop
C  := An;
An := Bn;
Bn := C mod Bn;
Lemma_Same_Divisor_Mod (C, An);
pragma Loop_Invariant
(for all X in Positive =>
(Divides (X, A) and Divides (X, B))
=
(Divides (X, An) and (Bn = 0 or else Divides (X, Bn))));
end loop;

pragma Assert (Divides (An, An));
pragma Assert (Divides (An, A));
pragma Assert (Divides (An, B));

pragma Assert (for all X in An + 1 .. Integer'Min (A, B) =>
not (Divides (X, A) and Divides (X, B)));
return An;
end GCD;


The implementation of GCD as well as the main lemma Lemma_Same_Divisor_Mod are proved easily by GNATprove (with --level=2 -j0 in less than ten seconds). The three elementary lemmas Lemma_Divisor_Mod, Lemma_Divisor_Transitive and Lemma_Divisor_Mod_Inverse are not proved automatically. Currently I only reviewed them and convinced myself that they were true. A better solution in the future will be to prove them in Coq, which should not be difficult given that there are similar existing lemmas in Coq, and include them in the SPARK lemma library.

All the code for these versions of GCD can be found in the GitHub repository of SPARK 2014, including another version of GCD where I changed the definition of Divides to reflect more accurately the mathematical notion of divisibility:

   function Divides (A, B : in Positive) return Boolean is
(for some C in Positive => A * C = B)
with Ghost;

This is also proved automatically by GNATprove, and this proof also requires the introduction of suitable lemmas that relate the mathematical notion of divisibility to its programming counterpart (B mod A = 0).

]]>
New strings package in GNATCOLL https://blog.adacore.com/new-strings-package-in-gnatcoll Tue, 04 Apr 2017 13:30:00 +0000 Emmanuel Briot https://blog.adacore.com/new-strings-package-in-gnatcoll

# New strings package in GNATCOLL

GNATCOLL has recently acquired two new packages, namely GNATCOLL.Strings and GNATCOLL.Strings_Impl. The latter is a generic package, one instance of which is provided as GNATCOLL.Strings.

But why a new strings package? Ada already has quite a lot of ways to represent and manipulate strings, is a new one needed?

This new package is an attempt at finding a middle ground between the standard String type (which is efficient but inflexible) and unbounded strings (which are flexible, but could be more efficient).

GNATCOLL.Strings therefore provides strings (named XString, as in extended-strings) that can grow as needed (up to Natural'Last, like standard strings), yet are faster than unbounded strings. They also come with an extended API, which includes all primitive operations from unbounded strings, in addition to some subprograms inspired from GNATCOLL.Utils and the Python and C++ programming languages.

## Small string optimization

GNATCOLL.Strings uses a number of tricks to improve on the efficiency.  The most important one is to limit the number of memory allocations.  For this, we use a trick similar to what all C++ implementations do nowadays, namely the small string optimization.

The idea is that when a string is short, we can avoid all memory allocations altogether, while still keeping the string type itself small. We therefore use an Unchecked_Union, where a string can be viewed in two ways:

    Small string

[f][s][ characters of the string 23 bytes               ]
f  = 1 bit for a flag, set to 0 for a small string
s  = 7 bits for the size of the string (i.e. number of significant
characters in the array)

Big string

[f][c      ][size     ][data      ][first     ][pad    ]
f = 1 bit for a flag, set to 1 for a big string
c = 31 bits for half the capacity. This is the size of the buffer
pointed to by data, and which contains the actual characters of
the string.
size = 32 bits for the size of the string, i.e. the number of
significant characters in the buffer.
data = a pointer (32 or 64 bits depending on architecture)
first = 32 bits, see the handling of substrings below
pad = 32 bits on a 64 bits system, 0 otherwise.
This is because of alignment issues.

So in the same amount of memory (24 bytes), we can either store a small string of 23 characters or less with no memory allocations, or a big string that requires allocation. In a typical application, most strings are smaller than 23 bytes, so we are saving very significant time here.

This representation has to work on both 32 bits systems and 64 bits systems, so we have careful representation clauses to take this into account.  It also needs to work on both big-endian and little-endian systems. Thanks to Ada's representation clauses, this one in fact relatively easy to achieve (well, okay, after trying a few different approaches to emulate what's done in C++, and that did not work elegantly). In fact, emulating via bit-shift operations ended up with code that was less efficient than letting the compiler do it automatically because of our representation clauses.

## Character types

Applications should be able to handle the whole set of unicode characters. In Ada, these are represented as the Wide_Character type, rather than Character, and stored on 2 bytes rather than 1. Of course, for a lot of applications it would be wasting memory to always store 2 bytes per character, so we want to give flexibility to users here.

So the package GNATCOLL.Strings_Impl is a generic. It has several formal parameters, among which:

* Character_Type is the type used to represent each character. Typically,  it will be Character, Wide_Character, or even possibly Wide_Wide_Character. It could really be any scalar type, so for instance we could use this package to represent DNA with its 4-valued nucleobases.

* Character_String is an array of these characters, as would be represented in Ada. It will typically be a String or a Wide_String. This type is used to make this package work with the rest of the Ada world.

Note about unicode: we could also always use a Character, and use UTF-8 encoding internally. But this makes all operations (from taking the length to moving the next character) slower, and more fragile. We must make sure not to cut a string in the middle of a multi-byte sequence. Instead, we manipulate a string of code points (in terms of unicode). A similar choice is made in Ada (String vs Wide_String), Python and C++.

## Configuring the size of small strings

The above is what is done for most C++ implementations nowadays.  The maximum 23 characters we mentioned for a small string depends in fact on several criteria, which impact the actual maximum size of a small string:

* on 32 bits system, the size of the big string is 16 bytes, so the maximum size of a small string is 15 bytes.
* on 64 bits system, the size of the big string is 24 bytes, so the maximum size of a small string is 23 bytes.
* If using a Character as the character type, the above are the actual number of characters in the string. But if you are using a Wide_Character, this is double the maximum length of the string, so a small string is either 7 characters or 11 characters long.

This is often a reasonable number, and given that applications mostly use small strings, we are already saving a lot of allocations. However, in some cases we know that the typical length of strings in a particular context is different. For instance, GNATCOLL.Traces builds messages to output in the log file. Such messages will typically be at most 100 characters, although they can of course be much larger sometimes.

We have added one more formal parameter to GNATCOLL.Strings_Impl to control the maximum size of small strings. If for instance we decide that a "small" string is anywhere from 1 to 100 characters long (i.e. we do not want to allocate memory for those strings), it can be done via this parameter. Of course, in such cases the size of the string itself becomes much larger.

In this example it would be 101 bytes long, rather than the 24 bytes.  Although we are saving on memory allocations, we are also spending more time copying data when the string is passed around, so you'll need to measure the performance here.

The maximum size for the small string is 127 bytes however, because this size and the 1-bit flag need to fit in 1 bytes in the representation clauses we showed above. We tried to make this more configurable, but this makes things significantly more complex between little-endian and big-endian systems, and having large "small" strings would not make much sense in terms of performance anyway.

Typical C++ implementations do not make this small size configurable.

Just like unbounded strings, the strings in this package are not thread safe. This means that you cannot access the same string (read or write) from two different threads without somehow protecting the access via a protected type, locks,...

In practice, sharing strings would rarely be done, so if the package itself was doing its own locking we would end up with very bad performance in all cases, for a few cases where it might prove useful.

As we'll discuss below, it is possible to use two different strings that actually share the same internal buffer, from two different threads. Since this is an implementation detail, this package takes care of guaranteeing the integrity of the shared data in such a case.

## Copy on write

There is one more formal parameter, to configure whether this package should use copy-on-write or not. When copy on write is enabled, you can have multiple strings that internally share the same buffer of characters. This means that assigning a string to another one becomes a reasonably fast operation (copy a pointer and increment a refcount). Whenever the string is modified, a copy of the buffer is done so that other copies of the same string are not impacted.

But in fact, there is one drawback with this scheme: we need reference counting to know when we can free the shared data, or when we need to make a copy of it. This reference counting must be thread safe, since users might be using two different strings from two different threads, but they share data internally.

Thus the reference counting is done via atomic operations, which have some impact on performance. Since multiple threads try to access the same memory addresses, this is also a source of contention in multi-threaded applications.

For this reason, the current C++ standard prevents the use of copy-on-write for strings.

In our cases, we chose to make this configurable in the generic, so that users can decide whether to pay the cost of the atomic operations, but save on the number of memory allocations and copy of the characters.  Sometimes it is better to share the data, but sometimes it is better to systematically copy it. Again, actual measurements of the performance are needed for your specific application.

## Growth strategy

When the current size of the string becomes bigger than the available allocated memory (for instance because you are appending characters), this package needs to reallocate memory. There are plenty of strategies here, from allocating only the exact amount of memory needed (which saves on memory usage, but is very bad in terms of performance), to doubling the current size of the string until we have enough space, as currently done in the GNAT unbounded strings implementation.

The latter approach would therefore allocate space for two characters, then for 4, then 8 and so on.

This package has a slightly different strategy. Remember that we only start allocating memory past the size of small strings, so we will for instance first allocate 24 bytes. When more memory is needed, we multiply this size by 1.5, which some researchers have found to be a good comprise between waste of memory and number of allocations. For very large strings, we always allocate multiples of the memory page size (4096 bytes), since this is what the system will make available anyway. So we will basically allocate the following: 24, 36, 54, 82, 122,...

An additional constraint is that we only ever allocate even number of bytes. This is called the capacity of the string. In the layout of the big string, as shown above, we store half that capacity, which saves one bit that we use for the flag.

## Growing memory

This package does not use the Ada new operator. Instead, we use functions from System.Memory directly, in part so that we can use the realloc system call. This is much more efficient when we need to grow the internal buffer, since in most cases we won't have to copy the characters at all. Saving on those additional copies has a significant impact, as we'll see in the performance measurements below.

## Substrings

One other optimization performed by this package (which is not done for unbounded strings or various C++ implementations) is to optimize substrings when also using copy-on-write.

We simply store the index within the shared buffer of the first character of the string , instead of always starting at one.

From the user's point of view, this is an implementation detail. Strings are always indexed from 1, and internally we convert to an actual position in the buffer. This means that if we need to reallocate the buffer, for instance when the string is modified, we transparently change the index of the first character, but the indexes the user was using are still valid.

This results in very significant savings, as shown below in the timings for Trim for instance. Also, we can do an operation like splitting a string very efficiently.

For instance, the following code doesn't allocate any memory, beside setting the initial value of the string. It parses a file containing some "key=value" lines, with optional spaces, and possibly empty lines:

     declare
S, Key, Value : XString;
L             : XString_Array (1 .. 2);
Last          : Natural;
begin
S.Set (".......");

--  Get each line
for Line in S.Split (ASCII.LF) loop

--  Split into at most two substrings
Line.Split ('=', Into => L, Last => Last);

if Last = 2 then
Key := L (1);
Key.Trim;    --  Removing leading and trailing spaces

Value := L (2);
Value.Trim;

end if;
end loop;
end;

## Conclusion

We use various tricks to improve the performance over unbounded strings.  From not allocating any memory when a string is 24 bytes (or a configurable limit) or less, to growing the string as needed via a careful growth strategy, to sharing internal data until we need to modify it, these various techniques combine to provide a fast and flexible implementation.

Here are some timing experiments done on a laptop (multiple operating systems lead to similar results). The exact timings are irrelevant, so the results are given in percentage of what the unbounded string takes to performance similar operations.

We configured the GNATCOLL.Strings package in different ways, either with or without copy-on-write, and with a small string size from the default 0..23 bytes or a larger 0..127 bytes.

    Setting a small string multiple times (e.g. Set_Unbounded_String)

unbounded_string                 = 100 %
xstring-23 without copy on write =  11 %
xstring-23 with copy on write    =  12 %
xstring-127 with copy on write   =  17 %

Here we see that not doing any memory allocation makes XString
much faster than unbounded string. Most of the time is spent
copying characters around, via memcpy.

Setting a large string multiple times (e.g. Set_Unbounded_String)

unbounded_string                 = 100 %
xstring-23 without copy on write =  41 %
xstring-23 with copy on write    =  50 %
xstring-127 with copy on write   =  32 %

Here, XString apparently proves better are reusing already
allocated memory, although the timings are similar when creating
new strings instead. Most of the difference is probably related to the
use of realloc instead of alloc.

Assigning small strings (e.g.   S2 := S1)

unbounded_string                 = 100 %
xstring-23 without copy on write =  31 %
xstring-23 with copy on write    =  27 %
xstring-127 with copy on write   =  57 %

Assigning large strings (e.g.   S2 := S1)

unbounded_string                 = 100 %
xstring-23 without copy on write = 299 %
xstring-23 with copy on write    =  63 %
xstring-127 with copy on write   =  60 %

When not using copy-on-write (which unbounded strings do), we need
to reallocate memory, which shows on the second line.

Appending to large string  (e.g.    Append (S, "...."))

unbounded_string                 = 100 %
xstring-23 without copy on write =  39 %
xstring-23 with copy on write    =  48 %
same, with tasking           = 142 %
xstring-127 with copy on write   =  49 %

When we use tasking, XStrings use atomic operations for the reference
counter, which slows things down. They become slower than unbounded
strings, because the latter in fact have a bug when using two
different strings from two different threads, and they share data
(they try to save on an atomic operation... this bug is being worked
on).

unbounded_string                 = 100 %
xstring-23 without copy on write =  50 %
xstring-23 with copy on write    =  16 %
xstring-127 with copy on write   =  18 %

Here we see the benefits of the substrings optimization, which shares
data for the substrings.

]]>
Simics helps run 60 000 GNAT Pro tests in 24 hours https://blog.adacore.com/simics-helps-run-60-000-gnat-pro-tests-in-24-hours Fri, 31 Mar 2017 14:06:00 +0000 Jerome Guitton https://blog.adacore.com/simics-helps-run-60-000-gnat-pro-tests-in-24-hours

This post has been updated in March 2017 and was originally posted in March 2016.

A key aspect of AdaCore’s GNAT Pro offering is the quality of the product we’re delivering and our proactive approach to resolving issues when they appear. To do so, we need both intensive testing before delivering anything to our customers and to produce “wavefront” versions every day for each product we offer. Doing so each and every day is a real challenge, considering the number of supported configurations, the number of tests to run, and the limit of a 24-hour timeframe. At AdaCore, we rely heavily on virtualization as part of our testing strategy. In this article, we will describe the extent of our GNAT Pro testing on VxWorks, and how Simics helped us meet these challenges.

## Broad Support for VxWorks Products

The number of tests to run is proportional to the number of configurations that we support. We have an impressive matrix of configurations to validate:

• Versions: VxWorks, VxWorks Cert, and VxWorks 653… available on the full range of versions that we support (e.g. 5.5, 6.4 to 6.9, 653 2.1 to 2.5...)
• CPUs: arm, ppc, e500v2, and x86...
• Program type: Real Time Process, Downloadable Kernel Module, Static Kernel Module, vThreads, ARINC 653 processes, and with the Cert subsets…

Naturally, there are some combinations in this matrix of possibility that are not supported by GNAT, but the reality is we cover most of it. So the variety of configurations is very high.

The matrix is growing fast. Between 2013 and 2015, we have widened our offer to support the new VxWorks ports (VxWorks 7, VxWorks 653 3.0.x) on a large range of CPUs (arm, e500v2, ppc..), including new configurations that were needed by GNAT Pro users (x86_64). This represents 32 new supported configurations to be added to the 168 existing ones. It was obviously a challenge to qualify all these new versions against our existing test suites, with the goal of supporting the new VxWorks versions as soon as possible.

Simics supports a wide range of VxWorks configurations, and has a good integration with the core OS itself. This made Simics a natural solution for our testing strategy of GNAT Pro. On VxWorks 7 and 653 3.0.x, it allowed us to quickly set up an appropriate testing environment, as predefined material exists to make it work smoothly with most Wind River OSes; we could focus on our own technology right away, instead of spending time on developing, stabilizing and maintaining a new testing infrastructure from scratch.

## Representative Testing on Virtual Hardware

Another benefit of Simics is that it not only supports all versions of VxWorks, but also emulates a wide range of hardware platforms. This allows us to test on platforms which are representative of the hardware that will be used in production by GNAT Pro users.

## A stable framework for QA led to productivity increases

Stability is an important property of the testing framework; otherwise, “glitches” caused by the testing framework start causing spurious failures, often randomly, that the QA team then needs to analyze each time. Multiplied by the very large number of tests we run each day, and the large number of platforms we test on, lack of stability can quickly lead to an unmanageable situation. Trust of the test framework is a key factor for efficiency.

In that respect, and despite the heavy parallelism required in order to complete the validation of all our products in time, Simics proved to be a robust solution and behaved well under pressure, thus helping us focus our energy on differences caused by our tools, rather than on spurious differences caused by the testing framework.

## Test execution speed allowed extensive testing

Some additional info about how broad our testing is. On VxWorks, we mostly have three sets of testsuites:

• The standard Ada validation testsuite (ACATS): around 3,600 tests;
• The regression testing of GNAT: around 15,000 tests;
• Tool-specific testsuites: around 1,800 tests.

All in all, just counting the VxWorks platforms, we are running around 350,000 tests each day. 60,000 of them are run on Simics, mostly on the more recent ports (VxWorks 7 and 653 3.x).

In order to run all these tests, an efficient testing platform is needed. With Simics, we were able to optimize the execution by:

• Proper tuning of the simulation target;
• Stopping and restarting the platform at an execution point where tests can be run right away, using checkpoints;

We will give more technical details about these optimizations in a future article.

## March 2017 Update

AdaCore’s use of Simics has not substantially changed since last year: it is still a key component of GNAT Pro’s testing strategy on VxWorks platforms. In 2017, GNAT Pro has been ported to PowerPC 64 VxWorks 7; with its broad support for VxWorks products, Simics has been a natural solution to speed up this port.

## April 2018 Update

This year we smoothly switched to Simics 5 on all supported configurations; Simics has been of great help to speed up the new port of GNAT Pro to AArch64 VxWorks 7.

]]>
Two Projects to Compute Stats on Analysis Results https://blog.adacore.com/two-projects-to-compute-stats-on-analysis-results Thu, 30 Mar 2017 04:00:00 +0000 Yannick Moy https://blog.adacore.com/two-projects-to-compute-stats-on-analysis-results

The project by Daniel King allows you to extract the results from the log file gnatprove.out generated by GNATprove, into an Excel spreadsheet. What's nice is that you can then easily sort units according to the metrics you are following inside your spreadsheet viewer.

The project by Martin Becker allows you to extract the results from the JSON files generated by GNATprove for each unit analyzed, aggregate these results, and output them into either textual or JSON format. What's nice is that you can then integrate the generated JSON with your automated scripts/setup.

I have installed both on my laptop, and used them on examples such as Tokeneer, and I am happy to report that the results are consistent! :-) I don't know yet which one will become more useful with my setup, but both are worth a try.

]]>
GNATcoverage moves to GitHub https://blog.adacore.com/gnatcoverage-moves-to-github Wed, 29 Mar 2017 13:13:51 +0000 Pierre-Marie de Rodat https://blog.adacore.com/gnatcoverage-moves-to-github

Following the current trend, the GNATcoverage project moves to GitHub! Our new address is: https://github.com/AdaCore/gnatcoverage

GNATcoverage is a tool we developed to analyze and report program coverage. It supports the Ada and C programming languages, several native and embedded platforms, as well as various coverage criteria, from object code level instruction and branch coverage up to source level decision or MC/DC coverage, qualified for use in avionics certification contexts. For source-level analysis, GNATcoverage works hand-in-hand with the GNAT Pro compilers.

Originally developed as part of the Couverture research project, GNATcoverage became a supported product for a first set of targets in the 2009/2010 timeframe.

Since the beginning of the project, the development happened on the OpenDO forge. This has served us well, but we are now in the process of moving all our projects to GitHub. What does this change for you?

• We will now use GitHub issues and pull requests for discussions that previously happened on mailing lists. We hope these places will be more visible and community-friendly.

In the near future, we’ll close the project on the OpenDO forge. We are keen to see you on GitHub!

]]>
Writing on Air https://blog.adacore.com/writing-on-air Mon, 27 Mar 2017 13:00:00 +0000 Jorge Real https://blog.adacore.com/writing-on-air

While searching for motivating projects for students of the Real-Time Systems course here at Universitat Politècnica de València, we found a curious device that produces a fascinating effect. It holds a 12 cm bar from its bottom and makes it swing, like an upside-down pendulum, at a frequency of nearly 9 Hz. The free end of the bar holds a row of eight LEDs. With careful and timely switching of those LEDs, and due to visual persistence, it creates the illusion of text... floating in the air!

The web shows plenty of references to different realizations of this idea. They are typically used for displaying date, time, and also rudimentary graphics. Try searching for "pendulum clock LED", for example. The picture in Figure 1 shows the one we are using.

The software behind this toy is a motivating case for the students, and it contains enough real-time and synchronisation requirements to also make it challenging.

We have equipped the lab with a set of these pendulums, from which we have disabled all the control electronics and replaced them with STM32F4 Discovery boards. We use also a self-made interface board (behind the Discovery board in Figure 1) to connect the Discovery with the LEDs and other relevant signals of the pendulum. The task we propose our students is to make it work under the control of a Ravenscar program running on the Discovery. We use GNAT GPL 2016 for ARM hosted on Linux, along with the Ada Drivers Library.

There are two different problems to solve: one is to make the pendulum bar oscillate with a regular period; the other one is to then use the LEDs to display some text.

# Swing that bar!

The bar is fixed from the bottom to a flexible metal plate (see Figure 2). The stable position of the pendulum is vertical and still. There is a permanent magnet attached to the pendulum, so that the solenoid behind it can be energised to cause a repulsion force that makes the bar start to swing.

At startup, the solenoid control is completely blind to the whereabouts of the pendulum. An initial sequence must be programmed with the purpose of moving the bar enough to make it cross the barrier (see detail in Figure 3), a pass detector that uses an opto-coupler sensor located slightly off-center the pendulum run. This asymmetry is crucial, as we'll soon see.

Once the bar crosses the barrier at least three times, we have an idea about the pendulum position along time and we can then apply a more precise control sequence to keep the pendulum swinging regularly. The situation is pretty much like swinging a kid swing: you need to give it a small, regular push, at the right time. In our case, that time is when the pendulum enters the solenoid area on its way to the right side, since the solenoid repels the pendulum rightwards. That happens at about one sixth of the pendulum cycle, so we first need to know when the cycle starts and what duration it has. And for that, we need to pay close attention to the only input of the pendulum: the barrier signal.

Figure 4 sketches a chronogram of the barrier signal. Due to its asymmetric placement, the signal captured from the opto-coupler is also asymmetric.

To determine the start time and period of the next cycle, we take note of the times when rising and falling edges of the barrier signal occur. This is easy work for a small Finite State Machine (FSM), triggered by barrier interrupts to the Discovery board. Once we have collected the five edge times T1 to T5 (normally would correspond to 2 full barrier crossings plus the start of a third one) we can calculate the period by subtracting T5 - T1. Regarding the start time of th