Security-Hardening Software Libraries with Ada and SPARK

by Kyriakos Georgiou – Jul 08, 2021

Part of AdaCore's ongoing efforts under the HICLASS project is to demonstrate how the SPARK technology can play an integral part in the security-hardening of existing software libraries written in other non-security-oriented programming languages such as C. This blog post presents the first white paper under this work-stream, “Security-Hardening Software Libraries with Ada and SPARK”.

In this paper, we have taken a quantitative approach, where SPARK is put under test to demonstrate that it can be relatively easily adopted and that the massive benefits of its adoption do not come with a significant negative impact on the performance of a program. To showcase this, a set of benchmarks from a modern C embedded benchmark suite, namely the EMBENCH suite, are converted to SPARK to guarantee the Absence of RunTime Errors (AoRTE). Runtime errors are a well-known source of security-related vulnerabilities, with dozens of system security breaches related to them. Thus, such runtime errors should be eliminated in the context of high assurance systems to avoid potential exploitations that can lead to safety issues. The work not only achieves the above goals but in the process, a plethora of valuable guidelines and best practices are offered to enable others to rapidly adopt the technology for hardening software libraries.

SPARK Levels of Software Assurance

Adopting SPARK for a new project can seem like an intimidating task, especially when having no prior experience with the technology. The difﬁculty of adopting SPARK depends on the scope of the analysis, such as a whole project or units of a project, and the software assurance level required. Fortunately, there are well-deﬁned guidelines on the adoption of the SPARK technology that simplify the process. These guidelines are offered in the form of ﬁve levels of SPARK assurance, which are incremental in both the effort required and the amount of software assurance they provide. The ﬁve levels are:

Stone level - valid SPARK
Bronze level - initialization and correct data ﬂow
Silver level - Absence of RunTime Errors (AoRTE)
Gold level - proof of key integrity properties
Platinum level - full functional proof of requirements

For this work, we are aiming to reach up to the third level of SPARK assurance. With this level, programs are guaranteed AoRTE, a major source of security vulnerabilities. This also eliminates the need for runtime checks. To move at the highest levels of SPARK requires the precise intent and speciﬁcations of the application to be known, something not always feasible for code that is not written from scratch or well documented, as is the case with the EMBENCH benchmarks conversion.

The EMBENCH Benchmark Suite

EMBENCH is a recently formed free and open-source embedded benchmark suite, intending to provide benchmarks that reﬂect today’s embedded Internet of Things (IoT) applications’ needs. The initiative for establishing the benchmark suite came from Prof. David A. Patterson, one of the principal ﬁgures behind the RISC-V processor. The main idea is to move away from outdated, artiﬁcial benchmarks, such as the Dhrystone and CoreMark, which are no longer representative of modern embedded applications, and introduce a benchmark suite that will continuously evolve to keep up with the new trends in embedded systems.

Currently, EMBENCH includes 19 benchmarks carefully selected to be a representative range of the application space that is typically found in today’s IoT. The suite’s build-system supports both native (x-86, Linux) and cross-compilation of the benchmarks, currently for RISC-V and Arm’s Cortex-M4 based development boards.

Selection of Benchmarks

For a fair performance comparison between the programming languages C and SPARK, the underlying algorithmic logic and the main program structure of the original EMBENCH benchmarks had to be retained in their new correspondent SPARK versions. This limitation can signiﬁcantly impact the effective use of features and capabilities of the Ada/SPARK languages. For example, casting from one scalar type to another scalar type of smaller range or shifting signed integers are things that you typically do not expect to be part of an Ada or SPARK program. Furthermore, there is a limitation on which level of SPARK assurance can be achieved on each benchmark that stems from the direct translation of the C code to firstly equivalent Ada code and then SPARK code; note that the Ada intermediate stage of translation is needed to facilitate a step-wise process and reduce the complexity of the translation.

The original code was not developed with SPARK's formal veriﬁcation technology in mind. This can make it challenging to preserve the original code's logic and structure while enabling the veriﬁcation tools to perform at their optimum level. Finally, in many cases, the lack of sufﬁcient design documentation, particularly in the form of comments within the benchmarks' code, makes it challenging to apply SPARK contracts that capture the semantics of the code. Thus, we can only achieve at best the Silver level of SPARK, AoRTE, which is the default level aimed for high-assurance software.

Under the above considerations, eleven benchmarks, shown in the table below, were deemed applicable for this work. The table also shows the level of SPARK assurance achieved for each benchmark. If a benchmark is not at Silver, the number of its sub-modules that reached the Silver level is also given.

Table 1. EMBENCH benchmarks converted to SPARK and their achieved level of SPARK.
Benchmark	Level Of SPARK	No. of Submodules at Silver level
aha-mont64	Silver	all
crc32	Silver	all
edn	Silver	all
huffbench	Silver	all
matmult-int	Silver	all
nettle-aes	Silver	all
nsichneu	Silver	all
st	Bronze	3/4
ud	Bronze	0/1
minver	Bronze	1/2
nbody	Bronze	1/2

Achieving the Different SPARK Levels

The Implementation Guidance for the Adoption of SPARK manual offers excellent guidance on how to achieve SPARK's several levels of software assurance. These guidelines form the basis for the white paper's work. Its purpose is not to give a comprehensive manual as this is already done in the aforementioned SPARK adoption manual, but rather to give enough context for the reader to understand the process and share the practical experiences of hardening software libraries with Ada and SPARK. Also, any signiﬁcant steps/ﬁndings not found in the SPARK manual used as a starting point, such as a new SPARK feature, are highlighted.

The detailed stepwise process followed, and the several challenges and ﬁndings for each step of converting the benchmarks to Ada and SPARK and then achieving the different levels of SPARK software assurance can be found under Section 4.2 of the relevant white paper.

One thing I would like to highlight regarding how SPARK can be effectively applied to gain software assurance is the bottom-up modular approach that SPARK technology offers. The modularity is mainly two-fold:

Being able to apply the three modes (check, flow analysis, proof) of GNATprove at the line, region, subprogram, and module (ﬁle) level.
Each level of SPARK assurance achieved ensures the conformance of all of its lower SPARK assurance levels.

It is highly recommended to take a bottom-up modular approach when hardening software libraries with SPARK, such that before moving to a higher level of SPARK, all of the lower levels are achieved. Besides, smaller parts of the code, with the lowest amount of interaction, such as leaf-node subprograms in the call-tree of a program, should be targeted ﬁrst with a gradual move towards the ﬁle level. This signiﬁcantly reduces the effort of achieving the different levels of SPARK assurance.

Although for this work SPARK Pro and GNAT Pro versions 21.0w are used, GNAT Community 2021 is also applicable. GNAT Studio 21.0w was the Integrated Development Environment (IDE) used to convert the C benchmarks into SPARK and invoke the GNATprove SPARK static analyser.

Evaluation

The white paper includes a thorough discussion on the performance evaluation, issues captured by SPARK on the original benchmarks, assessment of the time needed for the completion of the SPARK-related tasks, and some issues found and improved within the SPARK technology. We will cover only the effort/time needed and the performance evaluation for this post.

Effort/Time Evaluation

One of the aims of this work was to evaluate the effort needed in achieving the absence of runtime errors with the SPARK technology. The completion of this work lasted around 35 working days. The level of experience with the Ada and SPARK technologies was around four months, although the engineer (myself) that carried out the work had an overall programming experience of about 15 years. Taking this into account and that the Silver level of SPARK was achieved within this time frame for the majority of the benchmarks, 7 out of 11, and for most of the functions for the remaining benchmarks, it is fair to say that SPARK technology is easily accessible and its adoption can yield significant benefits for hardening software libraries for security in a short time.

Performance Evaluation

The performance evaluation was done on an STM32F4DISCOVERY development board. The board is equipped with a 32-bit ARM Cortex-M4 with FPU core, 1-Mbyte Flash memory, 192-Kbyte RAM, and it can run at a maximum frequency of 168MHz. The ARM Cortex-M4 and the RISK-V 32-bit processors are the two embedded processors currently supported in the EMBENCH benchmark suite due to their popularity in the embedded industry.

Both in C and SPARK, the benchmarks were compiled with the GNAT Pro 21.0w compiler using the -O2 optimization ﬂag and with link-time optimizations enabled. The link-time optimizations were essential to allow a fairer comparison between the C and the SPARK versions, as the SPARK code for each benchmark is separated from its test harness source ﬁle while the C benchmarks’ code lives in the same ﬁle as their test-harness. Thus, inlining of the benchmark code was feasible in the C versions and not in the SPARK versions. By enabling link-time optimizations, inlining was also enabled for the SPARK code similarly to the C code. Furthermore, Ada’s runtime checks were disabled, using the -gnatp ﬂag, to make the performance comparison fair with C. Nevertheless, subprograms proved at the SPARK Silver level come with an AoRTE guarantee, and thus runtime checks can be safely disabled. This is essential towards certifying at the highest levels of software assurance of safety standards, such as the DO-178C for avionics or the CENELEC EN 50128 for rail systems. This is equally applicable when certifying against security-critical standards and guidelines such as ED-202A/ED-203A and DO-326A/DO-356A for aviation.

Table 2. Performance comparison of the C and SPARK versions of each benchmark. Note that a positive percentage represents the percentage increase in execution time for a benchmark written in SPARK when compared to the execution time of its corresponding C version, and vice versa, a negative percentage represents the percentage decrease in execution time.
Benchmark	Level Of SPARK	SPARK vs C (Performance)
aha-mont64	Silver	-24.07%
crc32	Silver	0.00%
edn	Silver	8.56%
huffbench	Silver	2.54%
matmult-int	Silver	4.27%
nettle-aes	Silver	10.91%
nsichneu	Silver	7.55%
st	Bronze	-16.70%
ud	Bronze	9.49%
minver	Bronze	-0.66%
nbody	Bronze	38.83%

Table 2 and Figure 1 show the performance comparison for the C and SPARK versions of the benchmarks. Note that even though the "minver" original benchmark is found to be functionally incorrect, the SPARK level implementation matches that behaviour. Thus, a performance comparison is still valid. With the exception of the "nbody" benchmark, there is no signiﬁcant sacriﬁce in performance when moving from C to SPARK. This is because there were no signiﬁcant intrusive modiﬁcations needed to the code to support SPARK proves.

Figure 1. Performance comparison of the C and SPARK versions of each benchmark.

Conclusion

The hardening of existing code-bases is expected to become a commonplace security exercise due to the rise of industry-mandated cyber-security standards and guidelines. This is especially predicted within aerospace with the recent adoption by the European Union Aviation Safety Agency (EASA) of the ED-202A/ED-203A security set as the currently only “Acceptable Means of Compliance” for aviation cyber-security airworthiness certiﬁcation. This is expected to be equally true of the Federal Aviation Administration (FAA) and the technically equivalent DO-326A/DO-356A set.

This is where SPARK can play an integral role in designing a new security architecture (or when security hardening an existing architecture). The white paper summarized by this post shows that existing systems can beneﬁt from the application of a SPARK hardening approach. Elevation of a security component to SPARK Silver level or higher provides strong evidence to support a security effectiveness assurance argument. This is particularly true when arguing over the effectiveness of security measures against threat scenarios relating to application security vulnerabilities.

By design, SPARK aims to eradicate all security bugs, ﬂaws, errors, faults, holes, or weaknesses in software architecture regardless of if threat actors can exploit them. When proven to have achieved Silver level or higher, the guaranteed AoRTE is a powerful countermeasure against cyber-attacks. This work demonstrated that the effort of adopting SPARK is not as hard as perceived; in an arguably short amount of time, a set of benchmarks from the EMBENCH benchmark suite were converted from C to SPARK, and AoRTE was achieved in most of the cases, (see Table 1). In addition, had the complete functional speciﬁcations been available for the remaining, not fully proven benchmarks, the complete set of benchmarks could have achieved the Silver level of SPARK assurance. The SPARK technology was also able to identify a faulty benchmark, namely the "minver", where its implementation was incomplete and produced the wrong results.

Furthermore, as demonstrated, the adoption of the SPARK technology did not signiﬁcantly compromise the C-enabled performance. This and the signiﬁcant security beneﬁt of proving AoRTE make SPARK a default choice when it comes to the hardening of software libraries. Finally, the work demonstrates the steps and best practices for adopting SPARK and highlights the use of new features. These, and the provided references to external documentation, can be used as up-to-date guidelines for the easy adoption of the SPARK technology.

Acknowledgments

The "High-Integrity Complex Large Software and Electronic Systems'' (HICLASS) working group was created to ensure that the UK stays a leading force within civilian aerospace manufacturing. HICLASS is made up of UK academia, tier-one aerospace manufacturers, and associated software development tool providers. This group has been assembled to action the technology strategy set out by the Aerospace Technology Institute (ATI). AdaCore's involvement within this group is to evolve SPARK and associated technologies to ensure they align with the objectives set out within the ever-emerging cyber-security guidelines.

HICLASS is supported by the Aerospace Technology Institute (ATI) Programme, a joint Government and industry investment to maintain and grow the UK’s competitive position in civil aerospace design and manufacture, under grant agreement No. 113213. The programme, delivered through a partnership between the ATI, Department for Business, Energy & Industrial Strategy (BEIS), and Innovate UK, addresses technology, capability, and supply chain challenges. More information about the HICLASS project can be found here.

Posted in #SPARK #STM32 #Embedded

About Kyriakos Georgiou

Dr. Kyriakos Georgiou is a certified Prince2 Project Manager and a Senior Research Engineer at AdaCore UK. He is currently managing the SCHEME (Safety-Critical Harsh Environment Micro-processing Evolution) UK-funded research project at AdaCore, which aims to deliver the next-generation safety-critical, cyber-secure processing platform required for low-carbon complex intelligence systems. Kyriakos is part of the GNATDAS (Dynamic Analysis Suite) team and is actively developing GNATfuzz, a highly automated fuzz-testing solution for Ada and SPARK, which will soon be expanded to support other programming languages. Before joining AdaCore, he was a senior research associate and lecturer at the University of Bristol, UK, where he researched energy estimation and optimization techniques for software and compiler optimization autotuning support. Currently, Kyriakos holds the honorary status of Research Fellow in the Faculty of Engineering of the University of Bristol, UK.