Fuzz Testing in International Aerospace Guidelines
by Paul Butcher –
For obvious reasons, civilian aerospace is steeped in safety regulation. Long-standing international governing bodies mandate and oversee the specification, design, and implementation of civil avionics such that failure conditions that could lead to safety hazards are identifiable, assessed, and mitigated.
During FuzzCon Europe 2021, Paul Butcher talked about considerations over why international aerospace regulatory bodies felt additional guidelines that combine aviation safety and security were needed in the form of an "Airworthiness Security Process".
Paul Butcher on fuzz testing in international aerospace guidelines.
Through the HICLASS UK research group, AdaCore has been developing security-focused software development tools that are aligned with the objectives stated within the avionics security standards. In addition, they have been developing further guidelines that describe how vulnerability identification and security assurance activities can be described within a Plan for Security Aspects of Certification.
What Do We Mean by Airworthiness?
The number one priority with civilian air travel is human safety. Everything else is secondary and, while other factors, including security, are important, none of them will ever be placed before human safety. In this context, human safety is focused on any persons involved in the operation of the air vehicle (i.e., passengers, the flight crew, the ground crew etc).
This regulatory enforced approach to air travel is set out in international legislation and can be summarised by the term "Airworthiness". Operators wanting to fly their air vehicles need to gain airworthiness from the regulatory authority responsible for the airspace the operator wants to travel within. More specifically, organisations such as the Federal Aviation Administration (FAA) (as part of the U.S. Department of Transportation) and the European Union Aviation Safety Agency (EASA) (for air corridors across Europe), plus others.
Ensuring our air vehicles are airworthy, and therefore safe for flight is a challenge. However a bigger challenge is convincing the regulatory authorities that the vehicles are safe! Here approaches like safety cases are used to document clear, concise, and convincing safety arguments. The goal of these arguments is to convince the certification authority that the risk of an air-vehicle systems failure (that could lead to a safety hazard) is as low as reasonably practical.
Fortunately, the regulatory authorities provide help in the form of "Advisory Circulars" (ACs) that stipulate that certain standards and guidelines are deemed as an acceptable means of compliance (AMC) with specific aspects of airworthiness. DO-178C, titled "Software Considerations in Airborne Systems and Equipment Certification", is a prime example and is often used to gain approval of the safe usage of commercial software-based aerospace systems.
It is also fair to say that, even considering the complex, thorough, and mandated regulatory safety processes, the industry is very good at achieving airworthiness certifications. This is good, and we should all sleep better knowing that these safeguards are in place!
Security Trends in Modern Civil Avionics
So, if civilian air travel is already very safe, why do we need an "Airworthiness Security Process"? This can be partially answered by a keynote address made by Robert Hickey during the 2017 CyberSat Summit:
“We got the airplane on Sept. 19, 2016. Two days later, I was successful in accomplishing a remote, non-cooperative, penetration” (Ref: Aviation Today)
In order to understand the full context of that statement, I would encourage you to read the full article. However, what was fascinating to me about this hack was that Robert Hickey stated that this was not conducted in a laboratory but on a civilian aircraft parked at the airport in Atlantic City.
“[Which] means I didn’t have anybody touching the airplane, I didn’t have an insider threat. I stood off using typical stuff that could get through security and we were able to establish a presence on the systems of the aircraft.”
What is maybe more worrying is that the report goes on to imply that the involved Avionics Original Equipment Manufacturers (OEMs) later declared they were aware of the exploit path, as well as many others.
Another interesting statement made by Robert Hickey during that keynote address in 2016 was around the estimated staggering cost to patch software in a deployed avionics system:
“The cost to change one line of code on a piece of avionics equipment is $1 million, and it takes a year to implement.”
Clearly, this emphasises the obvious secondary need for the aircraft industry to construct safe and secure aircraft systems.
Why the Need for Aviation Security Standards?
One reason this situation could have occurred is likely due to the terminology used within the existing safety guidelines. "Failure Conditions" are widely understood within the industry to be resulting scenarios that directly affect the vehicle and/or its occupants. These conditions are caused by internal failures, system errors, environmental operating conditions, extreme external events such as atmospheric conditions, and other scenarios such as bird strikes and baggage fires.
In order to gain airworthiness, all aircraft Failure Conditions need to be identified, analysed, and understood such that the resulting effect can be categorized, associated with any known safety hazard, and mitigated if appropriate.
Failure Conditions are then sorted into the following categories:
- Catastrophic - Failure may cause deaths, usually with loss of the airplane.
- Hazardous - Failure has a large negative impact on safety or performance, or reduces the ability of the crew to operate the aircraft due to physical distress or a higher workload, or causes serious or fatal injuries among the passengers.
- Major - Failure significantly reduces the safety margin or significantly increases crew workload. May result in passenger discomfort (or even minor injuries).
- Minor - Failure slightly reduces the safety margin or slightly increases crew workload. Examples might include causing passenger inconvenience or a routine flight plan change.
- No Effect - Failure has no impact on safety, aircraft operation, or crew workload.
The problem, however, is that there is no explicit consideration of cyber-threats acting as events that can lead to a failure condition. In order to address this shortfall two, new working groups within the RTCA and EUROCAE were formed and tasked with producing an Airworthiness Security Process.
This led to the birth of a set of standards and guidelines widely known as the ED-202A/DO-326A set, and an early action of this joint committee was to bring a new term to the table, namely a "Threat Condition".
"A condition having an effect on the aeroplane and/or its occupants, either direct or consequential, which is caused or contributed to by one or more acts of intentional unauthorised electronic interaction, involving cyber threats, considering flight phase and relevant adverse operational or environmental conditions. Also see failure condition." (Ref: ED-202A/DO-326A)
Here the terminology is deliberately succinct; a Threat Condition focuses on the effect of a cyberattack on the air vehicle. This also makes it very clear that the primary purpose of the Airworthiness Security Process is to ensure the safety of flight.
Airworthiness Security Process (AWSP)
The process comprises seven main stages broken down into sub-stages with identified stage inputs and stage outputs. The initial phase is known as the "Plan for Security Aspects of Certification" (PsecAC), and it is here that we set our security goals and how we intend to security test our application. Much like a "Plan for Safety Aspects of Certification", we need to ensure our regulatory authority accepts our plan before we commence with our development and test phases.
However, the narrative of the process should not be considered in any way linear. Instead, it tends to jump between sub-stages and loops around groups of stages as security assurance is reassessed, risk mitigation readdressed, and security development reworked. The process details are too complex to address in any detail within this blog post. However, one area of particular interest, where fuzz testing can play a crucial role, is within the "Security Effectiveness Assurance" stage.
Security Effectiveness Assurance
This phase aims to show compliance with security requirements and to evaluate the effectiveness of implemented security measures. More specifically, we need to verify that we have satisfied any explicit security requirements, demonstrate the effectiveness of any security measures to protect our identified security assets, and provide evidence to argue that our system is free of vulnerabilities. Note that in the context of ED-202A/DO-326A, the definition of vulnerability states that it has to be demonstrably exploitable.
Fuzz testing is one such means of meeting objectives within the Security Effectiveness Assurance phase due to three primary reasons:
- Fuzz testing can assess the effectiveness of a security measure
- Fuzz testing can identify vulnerabilities in the form of exploitable software bugs
- and therefore, Fuzz testing can help identify security assets
DO-356A / ED-203A and the Introduction of Security Refutation
The ED-202A/DO326A Airworthiness Security Process is supported by a set of guidelines stated within ED-203A/DO-356A, titled "Airworthiness Security Methods and Considerations". Here, the reader is introduced to the term "Refutation". The aim of the Refutation phase is to assess the security assurance of the system under test.
Refutation is all about refuting that the system is secure, and this negative take on standard verification-based testing (positive testing) is very deliberate. The intention is to direct the focus of the activity towards the mindset of an attacker. It is advised that multiple activities should be adopted to make up the Refutation testing phase, and the guidelines suggest that the following should be considered:
- Security penetration testing
- Fuzzing
- Static code analysis
- Dynamic code analysis
- Formal proofs
Fuzz testing is traditionally considered a negative testing capability and is therefore considerably well suited to refutation testing. Unfortunately, the guidelines around how to include a fuzzing campaign within a PsecAC are lacking. An industrial working group within HICLASS highlighted this gap and a clear appeal for a better understanding of the technology was made.
Guidelines and Considerations Around ED-203A / DO-356A Security Refutation Objectives
In order to meet this industry need, AdaCore produced a technical paper to provide additional considerations and guidelines over how to include a security refutation activity (including fuzz testing) within a PsecAC.
The paper is freely available via AdaCores tech papers website, and we gratefully accept any feedback that experts in the field of fuzz testing want to provide. One area of particular interest within this paper is the recommendation that a fuzzing campaign plan includes both a "starting criteria" and a "stopping criteria".
The starting criteria focus on the quality of a particular fuzz test’s starting corpus and argue that a good aim is to achieve 100% statement coverage. Once the starting criteria has been satisfied, we can commence the fuzzing campaign until the stopping criteria is met.
The stopping criteria guidelines state that a formula should be derived that determines the campaign duration. The formula's goal is to argue that the resulting duration is complementary to the level of security assurance the test is trying to achieve. Factors should include (but not be limited to):
- the average achievable test execution speed;
- the security assurance level of the targeted security measures;
- the cyclomatic complexity of the control flow of the application under test;
- the measured complexity of the test input data structure.
GNATfuzz for Airworthiness Security Assurance
Within HICLASS, AdaCore has researched and developed a fuzz testing capability for applications written in Ada and SPARK. GNATfuzz is being developed with security effectiveness assurance and security refutation objectives at the forefront of its high-level requirements. In addition, a secondary aim is to ensure the complexity of the setup, build, and execution of the fuzzing campaign is encapsulated away from the user, and this is achieved through a high level of test harness code automation.
More information about GNATfuzz and why Ada's rich runtime constraint checking capability makes it an excellent language of choice for fuzz testing can be found within the following AdaCore blog. In addition, if you would like to hear further thoughts from AdaCore about fuzz testing, please have a listen to our interview by Philip Winston on the IEEE Software Engineering Radio Podcast.
For a demonstration of the capabilities of this tool and the entire Fuzz Con Europe 2021 talk, please follow this link. To learn more about the HICLASS initiative, please look here.
Final Thoughts...
Fuzz testing is not a traditionally used technique within aerospace. However, the emergence of security guidelines such as ED-202A/DO-326A forces the industry to think again. Mature software testing approaches now need to adapt to new regulatory requirements around cyber threats.
In addition, avionics software development life-cycle plans now need to include considerations around security assurance testing and identifying exploitable software bugs. Where other industries, such as automotive and IoT, have been early adopters of fuzzing, aerospace is now playing catch up.