AdaCore Blog

Security-Hardening Software Libraries with Ada and SPARK

by Kyriakos Georgiou

Part of AdaCore's ongoing efforts under the HICLASS project is to demonstrate how the SPARK technology can play an integral part in the security-hardening of existing software libraries written in other non-security-oriented programming languages such as C. This blog post presents the first white paper under this work-stream, “Security-Hardening Software Libraries with Ada and SPARK”.


In this paper, we have taken a quantitative approach, where SPARK is put under test to demonstrate that it can be relatively easily adopted and that the massive benefits of its adoption do not come with a significant negative impact on the performance of a program. To showcase this, a set of benchmarks from a modern C embedded benchmark suite, namely the EMBENCH suite, are converted to SPARK to guarantee the Absence of RunTime Errors (AoRTE). Runtime errors are a well-known source of security-related vulnerabilities, with dozens of system security breaches related to them. Thus, such runtime errors should be eliminated in the context of high assurance systems to avoid potential exploitations that can lead to safety issues. The work not only achieves the above goals but in the process, a plethora of valuable guidelines and best practices are offered to enable others to rapidly adopt the technology for hardening software libraries.

SPARK Levels of Software Assurance

Adopting SPARK for a new project can seem like an intimidating task, especially when having no prior experience with the technology. The difficulty of adopting SPARK depends on the scope of the analysis, such as a whole project or units of a project, and the software assurance level required. Fortunately, there are well-defined guidelines on the adoption of the SPARK technology that simplify the process. These guidelines are offered in the form of five levels of SPARK assurance, which are incremental in both the effort required and the amount of software assurance they provide. The five levels are:

  1. Stone level - valid SPARK
  2. Bronze level - initialization and correct data flow
  3. Silver level - Absence of RunTime Errors (AoRTE)
  4. Gold level - proof of key integrity properties
  5. Platinum level - full functional proof of requirements

For this work, we are aiming to reach up to the third level of SPARK assurance. With this level, programs are guaranteed AoRTE, a major source of security vulnerabilities. This also eliminates the need for runtime checks. To move at the highest levels of SPARK requires the precise intent and specifications of the application to be known, something not always feasible for code that is not written from scratch or well documented, as is the case with the EMBENCH benchmarks conversion.

The EMBENCH Benchmark Suite

EMBENCH is a recently formed free and open-source embedded benchmark suite, intending to provide benchmarks that reflect today’s embedded Internet of Things (IoT) applications’ needs. The initiative for establishing the benchmark suite came from Prof. David A. Patterson, one of the principal figures behind the RISC-V processor. The main idea is to move away from outdated, artificial benchmarks, such as the Dhrystone and CoreMark, which are no longer representative of modern embedded applications, and introduce a benchmark suite that will continuously evolve to keep up with the new trends in embedded systems.

Currently, EMBENCH includes 19 benchmarks carefully selected to be a representative range of the application space that is typically found in today’s IoT. The suite’s build-system supports both native (x-86, Linux) and cross-compilation of the benchmarks, currently for RISC-V and Arm’s Cortex-M4 based development boards.

Selection of Benchmarks

For a fair performance comparison between the programming languages C and SPARK, the underlying algorithmic logic and the main program structure of the original EMBENCH benchmarks had to be retained in their new correspondent SPARK versions. This limitation can significantly impact the effective use of features and capabilities of the Ada/SPARK languages. For example, casting from one scalar type to another scalar type of smaller range or shifting signed integers are things that you typically do not expect to be part of an Ada or SPARK program. Furthermore, there is a limitation on which level of SPARK assurance can be achieved on each benchmark that stems from the direct translation of the C code to firstly equivalent Ada code and then SPARK code; note that the Ada intermediate stage of translation is needed to facilitate a step-wise process and reduce the complexity of the translation.

The original code was not developed with SPARK's formal verification technology in mind. This can make it challenging to preserve the original code's logic and structure while enabling the verification tools to perform at their optimum level. Finally, in many cases, the lack of sufficient design documentation, particularly in the form of comments within the benchmarks' code, makes it challenging to apply SPARK contracts that capture the semantics of the code. Thus, we can only achieve at best the Silver level of SPARK, AoRTE, which is the default level aimed for high-assurance software.

Under the above considerations, eleven benchmarks, shown in the table below, were deemed applicable for this work. The table also shows the level of SPARK assurance achieved for each benchmark. If a benchmark is not at Silver, the number of its sub-modules that reached the Silver level is also given.


BenchmarkLevel Of SPARKNo. of Submodules at
Silver level
aha-mont64Silverall
crc32Silverall
ednSilverall
huffbenchSilverall
matmult-intSilverall
nettle-aesSilverall
nsichneuSilverall
stBronze3/4
udBronze0/1
minverBronze1/2
nbodyBronze1/2
Table 1. EMBENCH benchmarks converted to SPARK and their achieved level of SPARK.

Achieving the Different SPARK Levels

The Implementation Guidance for the Adoption of SPARK manual offers excellent guidance on how to achieve SPARK's several levels of software assurance. These guidelines form the basis for the white paper's work. Its purpose is not to give a comprehensive manual as this is already done in the aforementioned SPARK adoption manual, but rather to give enough context for the reader to understand the process and share the practical experiences of hardening software libraries with Ada and SPARK. Also, any significant steps/findings not found in the SPARK manual used as a starting point, such as a new SPARK feature, are highlighted.

The detailed stepwise process followed, and the several challenges and findings for each step of converting the benchmarks to Ada and SPARK and then achieving the different levels of SPARK software assurance can be found under Section 4.2 of the relevant white paper.

One thing I would like to highlight regarding how SPARK can be effectively applied to gain software assurance is the bottom-up modular approach that SPARK technology offers. The modularity is mainly two-fold:

  1. Being able to apply the three modes (check, flow analysis, proof) of GNATprove at the line, region, subprogram, and module (file) level.
  2. Each level of SPARK assurance achieved ensures the conformance of all of its lower SPARK assurance levels.

It is highly recommended to take a bottom-up modular approach when hardening software libraries with SPARK, such that before moving to a higher level of SPARK, all of the lower levels are achieved. Besides, smaller parts of the code, with the lowest amount of interaction, such as leaf-node subprograms in the call-tree of a program, should be targeted first with a gradual move towards the file level. This significantly reduces the effort of achieving the different levels of SPARK assurance.

Although for this work SPARK Pro and GNAT Pro versions 21.0w are used, GNAT Community 2021 is also applicable. GNAT Studio 21.0w was the Integrated Development Environment (IDE) used to convert the C benchmarks into SPARK and invoke the GNATprove SPARK static analyser.

Evaluation

The white paper includes a thorough discussion on the performance evaluation, issues captured by SPARK on the original benchmarks, assessment of the time needed for the completion of the SPARK-related tasks, and some issues found and improved within the SPARK technology. We will cover only the effort/time needed and the performance evaluation for this post.

Effort/Time Evaluation

One of the aims of this work was to evaluate the effort needed in achieving the absence of runtime errors with the SPARK technology. The completion of this work lasted around 35 working days. The level of experience with the Ada and SPARK technologies was around four months, although the engineer (myself) that carried out the work had an overall programming experience of about 15 years. Taking this into account and that the Silver level of SPARK was achieved within this time frame for the majority of the benchmarks, 7 out of 11, and for most of the functions for the remaining benchmarks, it is fair to say that SPARK technology is easily accessible and its adoption can yield significant benefits for hardening software libraries for security in a short time.

Performance Evaluation

The performance evaluation was done on an STM32F4DISCOVERY development board. The board is equipped with a 32-bit ARM Cortex-M4 with FPU core, 1-Mbyte Flash memory, 192-Kbyte RAM, and it can run at a maximum frequency of 168MHz. The ARM Cortex-M4 and the RISK-V 32-bit processors are the two embedded processors currently supported in the EMBENCH benchmark suite due to their popularity in the embedded industry.

Both in C and SPARK, the benchmarks were compiled with the GNAT Pro 21.0w compiler using the -O2 optimization flag and with link-time optimizations enabled. The link-time optimizations were essential to allow a fairer comparison between the C and the SPARK versions, as the SPARK code for each benchmark is separated from its test harness source file while the C benchmarks’ code lives in the same file as their test-harness. Thus, inlining of the benchmark code was feasible in the C versions and not in the SPARK versions. By enabling link-time optimizations, inlining was also enabled for the SPARK code similarly to the C code. Furthermore, Ada’s runtime checks were disabled, using the -gnatp flag, to make the performance comparison fair with C. Nevertheless, subprograms proved at the SPARK Silver level come with an AoRTE guarantee, and thus runtime checks can be safely disabled. This is essential towards certifying at the highest levels of software assurance of safety standards, such as the DO-178C for avionics or the CENELEC EN 50128 for rail systems. This is equally applicable when certifying against security-critical standards and guidelines such as ED-202A/ED-203A and DO-326A/DO-356A for aviation.

BenchmarkLevel Of SPARKSPARK vs C (Performance)
aha-mont64Silver-24.07%
crc32Silver0.00%
ednSilver8.56%
huffbenchSilver2.54%
matmult-intSilver4.27%
nettle-aesSilver10.91%
nsichneuSilver7.55%
stBronze-16.70%
udBronze9.49%
minverBronze-0.66%
nbodyBronze38.83%
Table 2. Performance comparison of the C and SPARK versions of each benchmark. Note that a positive percentage represents the percentage increase in execution time for a benchmark written in SPARK when compared to the execution time of its corresponding C version, and vice versa, a negative percentage represents the percentage decrease in execution time.


Table 2 and Figure 1 show the performance comparison for the C and SPARK versions of the benchmarks. Note that even though the "minver" original benchmark is found to be functionally incorrect, the SPARK level implementation matches that behaviour. Thus, a performance comparison is still valid. With the exception of the "nbody" benchmark, there is no significant sacrifice in performance when moving from C to SPARK. This is because there were no significant intrusive modifications needed to the code to support SPARK proves.

Figure 1. Performance comparison of the C and SPARK versions of each benchmark.

Conclusion

The hardening of existing code-bases is expected to become a commonplace security exercise due to the rise of industry-mandated cyber-security standards and guidelines. This is especially predicted within aerospace with the recent adoption by the European Union Aviation Safety Agency (EASA) of the ED-202A/ED-203A security set as the currently only “Acceptable Means of Compliance” for aviation cyber-security airworthiness certification. This is expected to be equally true of the Federal Aviation Administration (FAA) and the technically equivalent DO-326A/DO-356A set.

This is where SPARK can play an integral role in designing a new security architecture (or when security hardening an existing architecture). The white paper summarized by this post shows that existing systems can benefit from the application of a SPARK hardening approach. Elevation of a security component to SPARK Silver level or higher provides strong evidence to support a security effectiveness assurance argument. This is particularly true when arguing over the effectiveness of security measures against threat scenarios relating to application security vulnerabilities.

By design, SPARK aims to eradicate all security bugs, flaws, errors, faults, holes, or weaknesses in software architecture regardless of if threat actors can exploit them. When proven to have achieved Silver level or higher, the guaranteed AoRTE is a powerful countermeasure against cyber-attacks. This work demonstrated that the effort of adopting SPARK is not as hard as perceived; in an arguably short amount of time, a set of benchmarks from the EMBENCH benchmark suite were converted from C to SPARK, and AoRTE was achieved in most of the cases, (see Table 1). In addition, had the complete functional specifications been available for the remaining, not fully proven benchmarks, the complete set of benchmarks could have achieved the Silver level of SPARK assurance. The SPARK technology was also able to identify a faulty benchmark, namely the "minver", where its implementation was incomplete and produced the wrong results.

Furthermore, as demonstrated, the adoption of the SPARK technology did not significantly compromise the C-enabled performance. This and the significant security benefit of proving AoRTE make SPARK a default choice when it comes to the hardening of software libraries. Finally, the work demonstrates the steps and best practices for adopting SPARK and highlights the use of new features. These, and the provided references to external documentation, can be used as up-to-date guidelines for the easy adoption of the SPARK technology.

Acknowledgments

The "High-Integrity Complex Large Software and Electronic Systems'' (HICLASS) working group was created to ensure that the UK stays a leading force within civilian aerospace manufacturing. HICLASS is made up of UK academia, tier-one aerospace manufacturers, and associated software development tool providers. This group has been assembled to action the technology strategy set out by the Aerospace Technology Institute (ATI). AdaCore's involvement within this group is to evolve SPARK and associated technologies to ensure they align with the objectives set out within the ever-emerging cyber-security guidelines.

HICLASS is supported by the Aerospace Technology Institute (ATI) Programme, a joint Government and industry investment to maintain and grow the UK’s competitive position in civil aerospace design and manufacture, under grant agreement No. 113213. The programme, delivered through a partnership between the ATI, Department for Business, Energy & Industrial Strategy (BEIS), and Innovate UK, addresses technology, capability, and supply chain challenges. More information about the HICLASS project can be found here.


Posted in #SPARK    #STM32    #Embedded   

About Kyriakos Georgiou

Kyriakos Georgiou

Dr. Kyriakos Georgiou is a Senior Research Engineer at AdaCore UK. His role is to research new technologies that can enhance the existing capabilities of AdaCore technologies, such as Symbolic Execution and compiler hardening. Before joining AdaCore, Kyriakos was a senior research associate and lecturer at the University of Bristol, UK, where he researched energy estimation and optimization techniques for software and compiler optimization autotuning techniques.