Automated Assurance through Differential Fuzzing

by Paul Butcher – Mar 07, 2023

Differential testing verifies an application's correct behavior by comparing it against a reference baseline. In addition, this software assurance technique becomes a force multiplier when coupled with a fuzzer.

"Differential Fuzzing" brings multiple advantages to standard "Differential Testing"; namely anomaly detection, automated input creation and coverage-guided input mutation. If we add an automated fuzz test-harness generator/executor into the mix, we can drastically reduce the time to achieve the desired level of software assurance.

However, the main caveat here is that we need a reference model, and this approach will not work for greenfield development. In some cases, a rapid prototyping approach may result in a reference model before a formal implementation, or there may be a standard library that can be referred to. In addition, having a reference model will often be the case during mid-life updates that include software implementation language transitions.

For some time now, we have seen new language adoption as a popular means of achieving security and safety assurance. Here a common approach is to adopt engineering programming languages that focus on safety and security and reimplement the applications previously written in memory-unsafe languages.

In November last year, Daniel King blogged about his adventures rewriting a popular C-based Secure Hash Algorithm 3 (SHA-3) library in the programming language SPARK. Daniel showed how a severe security vulnerability found in the eXtended Keccak Code Package (XKCP), would have been avoided if the library had been developed in SPARK. The SPARK library developed, called libkeccak, was able to prevent the vulnerability (or any other runtime error) by leveraging SPARK's proof capabilities to prove that runtime errors cannot occur for any possible input to the library.

As noted in the blog, the aim was to develop Libkeccak to the SPARK silver formal proof analysis level, which guarantees that the scope analyzed is absent of runtime errors. This level of assurance is vital with critical software. However, unless the analysis is taken to a higher level to show proof of key integrity properties or full functional proof of requirements, it does not assure that the application is functionally correct.

Daniel also states that:

"The library does not attempt to use proof to show that the cryptographic algorithms are implemented correctly. Instead, it is tested against 35,982 test vectors and 146 unit tests to verify that the library produces the correct results for each of the supported algorithms".

Therefore, this re-implementation provides a security-hardened library that is proven to be absent of runtime errors. In addition, the functional correctness of the SPARK cryptographic library is argued through unit-testing evidence.

Determining the strength of any functional correctness argument depends on the level of assurance you're trying to achieve, and mechanisms like coverage analysis can be used to bolster claims. However, the fact that the original XKCP library is a reference model for libkeccak immediately presents us with an interesting question:

"Can differential fuzz-testing increase the functional correctness assurance of Libkeccak?"

And because we're a very inquisitive bunch of engineers at AdaCore we set out to find the answer!

This blog describes the processes of setting up, executing and analyzing the results of a differential fuzzing campaign for the Libkeccak and XKCP cryptographic libraries.

The first step in this exploratory journey was to set up a fuzz test development environment that we could use to build and execute Libkeccak tests. Luckily this was a very straightforward process as Libkeccak is available within the Ada LIbrary REpository (ALIRE). ALIRE aims to fulfil a similar role to Rust's cargo or OCaml's opam repositories, and in my opinion, it achieves it. To install and build libkeccak I just needed to clone the repository and execute the command "alr build".

The second step was to understand the Libkeccak API protocol so that I could configure and utilize one of the cryptographic algorithms. For this I just reused the example from the repository readme:

with Keccak.Types;
with SHA3;

function Compute_Hash(Data : in Keccak.Types.Byte_Array)
   return SHA3.SHA3_256.Digest_Type
is
   Ctx    : SHA3.SHA3_256.Context;
   Digest : SHA3.SHA3_256.Digest_Type;

begin
   SHA3.SHA3_256.Init(Ctx);
   SHA3.SHA3_256.Update(Ctx, Data);
   SHA3.SHA3_256.Final(Ctx, Digest);

   return Digest;
end Compute_Hash;

The signature of this test application gave us a good starting point. The Ada type Keccak.Types.Byte_Array is, as the name would suggest, a representation of an array of bytes and, therefore, a data structure well suited to mutational fuzz testing. The final act is to create an Ada package for the test function and include it in the build process.

Step three involved generating a fuzz test harness for our new Compute_Hash function and, as discussed in previous blogs we can utilize the capabilities of GNATfuzz to automate this process.

To verify that the subprogram is fuzzable, we must execute GNATfuzz's "analyze" mode. The command we used is very straight forward:

gnatfuzz analyze -P libkeccak.gpr

All we need to do is pass the GPR file of Libkeccak to the analyze mode of GNATfuzz via "-P" switch. This generates a results file identifying all of the "fuzzable" components of the application. Searching through the list, we identified (as expected) that the Compute_Hash function can act as a fuzz test entry point.

To generate the harness, we identified the subprogram ID from the list of fuzzable subprograms that correlated with the Compute_Hash function and ran the following command:

gnatfuzz generate -P libkeccak.gpr --analysis analyze.json --subprogram-id 1

Running the "generate" command concludes this step by successfully (and automatically) creating a complete fuzz test build and execution environment.

At this point, we could immediately run a fuzz test via the GNATfuzz "fuzz" command. However, this would only allow us to fuzz for standard anomaly detection (i.e. runtime constraint check failures and segmentation faults). Ordinarily, this is a valuable assurance activity. However, in this instance, the SPARK analyzer has already formally proven the algorithm absent of runtime errors; therefore, we wouldn't expect the fuzz test to find anything useful.

Step 4 involves configuring the differential aspects of the test and connecting up the equivalent XKCP SHA3 256bit cryptographic algorithm. GNATfuzz simplifies this by generating a "User Configuration" package within the test harness framework that allows you to tailor the fuzz test to meet your specific needs. The actual container is a child Ada package of the hosting package that contains the subprogram under test (in this case, Compute_Hash). This package includes a number of hooks into the GNATfuzz test execute runtime for manipulating the test and the one that plays a key role with differential fuzzing is called "Teardown". Much like the unit test concept of test-teardown, the subprogram is used to free up or reset memory in between test execution. In addition, the GNATfuzz runtime provides this subprogram with the mutated input data and the result of the call to the function under test. In our example the subprogram signature looks like this:

   procedure Teardown
     (GNATfuzz_Data   : Keccak.Types.Byte_Array;
      GNATfuzz_Return : SHA3.SHA3_256.Digest_Type);

Therefore, to verify the return of the function under test against the reference model, we can feed the same input "GNATfuzz_Data" into the XKCP SHA3 256bit cryptographic algorithm and check that the returned message digest matches the Libkeccak result "GNATfuzz_Return".

As previously stated, XKCP is a library of cryptographic algorithms developed in the programming language C. Therefore, the next job is to create an Ada thin binding to invoke the corresponding C function. The GCC compiler's -fdump-ada-spec switch can help here. To reduce the complexity of the binding the self-contained version of XKCP (CompactFIPS202) was analyzed. The analysis resulted in an Ada package specification that included the following function.

   procedure FIPS202_SHA3_256
     (c_in  : access u8;
      inLen : u64;
      c_out : access u8)  -- Keccak-more-compact.c:10
   with Import        => True, 
        Convention    => C, 
        External_Name => "FIPS202_SHA3_256";

To call the thin binding with the correct values, we need to convert from the Libkeccak Ada types to the C types (via the Ada thin binding). To keep the implementation nice and tidy, we created a "thick binding":

   function FIPS202_SHA3_256_Thick_Binding
     (Data : Keccak.Types.Byte_Array)
      return SHA3.SHA3_256.Digest_Type
   is

      type Input_C_Style_Byte_Array is array (Data'Range) of
        aliased Keccak_more_compact_c.u8;
      --  C definition of the Ada Byte_Array compatible with the thin
      --  binding to the XKCP library API

      function Ada_Byte_Array_To_C_Byte_Array is new Ada.Unchecked_Conversion
        (Source => Keccak.Types.Byte_Array,
         Target => Input_C_Style_Byte_Array);
      --  Instantiate an Ada.Unchecked_Conversion to convert between the Ada and
      --  C versions of the byte array

      Input_Data_As_C_Array : aliased Input_C_Style_Byte_Array :=
        Ada_Byte_Array_To_C_Byte_Array (Data);
      --  Convert the input data into a C array compatible with the thin
      --  binding

      Digest_Size_256 : constant Keccak.Types.Index_Number :=
        256 / System.Storage_Unit;
      subtype Output_Range is Keccak.Types.Index_Number range
        Keccak.Types.Index_Number'First .. Digest_Size_256 - 1;
      --  Constrain the index of the returned data to a 256 bit range

      type Output_C_Style_Byte_Array is array (Output_Range) of
        aliased Keccak_more_compact_c.u8;
      --  Define a C definition of the Ada Byte_Array compatible with the thin
      --  binding to the XKCP library API

      Output_Data_As_C_Array : aliased Output_C_Style_Byte_Array :=
       (others => Keccak_more_compact_c.u8'First);
      --  Create an array to hold the message digest calculated by the XKCP
      --  version of the algorithm
      
      subtype Constrained_Ada_Style_Byte_Array is
        Keccak.Types.Byte_Array (Output_Range);
      --  In order to convert the returned C array into an Ada style array we
      --  need to first constrain it

      function C_Byte_Array_To_Ada_Byte_Array is new Ada.Unchecked_Conversion
        (Source => Output_C_Style_Byte_Array,
         Target => Constrained_Ada_Style_Byte_Array);
      --  Instantiate an Ada.Unchecked_Conversion to convert results from the
      --  XKCP algorithm into same type as the Libkeccak version. This will
      --  allow us to easily compare the message digests.
   begin

      --  Call the thin binding with the pointer values of the input and output
      --  buffers
      Keccak_more_compact_c.FIPS202_SHA3_256
        (c_in => Input_Data_As_C_Array (Input_Data_As_C_Array'First)'Access,
         inLen => Data'Length,
         c_out => Output_Data_As_C_Array (Output_Data_As_C_Array'First)'Access);

      --  Convert and return the result
      return
        C_Byte_Array_To_Ada_Byte_Array (Output_Data_As_C_Array);

   end FIPS202_SHA3_256_Thick_Binding;

The safe data conversion into the corresponding thin binding types is straightforward. We know the data type sizes are equivalent; therefore, we can safely use instantiations of Ada.Unchecked_Conversion to convert between the two byte array types.

We can now go back and complete the Teardown function:

   procedure Teardown
     (GNATfuzz_Data : Keccak.Types.Byte_Array;
      GNATfuzz_Return : SHA3.SHA3_256.Digest_Type) is
      use type SHA3.SHA3_256.Digest_Type;
      use Ada.Characters.Latin_1;

      XKCP_Result : SHA3.SHA3_256.Digest_Type :=
        FIPS202_SHA3_256_Thick_Binding (Data => GNATfuzz_Data);
      --  Call the XKCP compact SHA3 256bit hash function
   begin

      --  If the results don't match raise a test verification exception.
      --  This will be caught by the fuzz-test harness and translated into a
      --  core dump. This signals to the AFL++ fuzzer backend that a test
      --  anomaly has been detected.

      if GNATfuzz_Return /= XKCP_Result then
         declare
            Verification_Failure_Message : constant String :=
              "Error! XKCP generated a different result!" & LF &
              "LibKeccak Result :" & GNATfuzz_Return'Image & LF &
              "XKCP Result      :" & LF & XKCP_Result'Image;

            Test_Exception : exception;
            --  Raised if the Libkeccak result doesn't match the XKCP result

         begin
            raise Test_Exception with Verification_Failure_Message;
         end;
      end if;
   end Teardown;

Now, each time we execute a data mutation through the Compute_Hash function, we are not just checking for anomalies but also verifying the functional correctness against the original XKCP reference model!

To further the work we can easily utilise Ada's powerful generics feature to reuse the thick binding code for all of the algorithms in the XKCP reference model. In order to make the function generic we just need to identify the specific components that are unique to each algorithm, more specifically:

The return type of the calculated digest
The size of the digest in bits
The associated thin binding function

This gives us the following specification:

   type FIPS202_Algorithm_Callback_Type is not null access
     procedure
       (c_in  : access Keccak_more_compact_c.u8;
        inLen : Keccak_more_compact_c.u64;
        c_out : access Keccak_more_compact_c.u8)
     with Convention => C;
   --  Callback function used to invoke the thin binding to the XKCP algorithm

   generic
      type Return_Digest_Type (<>) is new Keccak.Types.Byte_Array;
      Digest_Size_Bits : Keccak.Types.Index_Number;
      FIPS202_Algorithm_Callback : FIPS202_Algorithm_Callback_Type;
   function FIPS202_Generic_Thick_Binding
     (Data : Keccak.Types.Byte_Array)
      return Return_Digest_Type;
   --  Generic thick binding to any SHA3 XKCP algorithm

We can then easily instantiate and call the generic thick binding functions within the TearDown procedure like this:

      use type SHA3.SHA3_256.Digest_Type;

      -- Instantiate thick bindings for each of the C algorithms

      function FIPS202_SHA3_224_Thick_Binding is new
        FIPS202_Generic_Thick_Binding
          (Return_Digest_Type         => SHA3.SHA3_224.Digest_Type,
           Digest_Size_Bits           => 224,
           FIPS202_Algorithm_Callback =>
              Keccak_more_compact_c.FIPS202_SHA3_224'access);
      --  XKCP SHA3_224 algorithm

      function FIPS202_SHA3_256_Thick_Binding is new
        FIPS202_Generic_Thick_Binding
          (Return_Digest_Type         => SHA3.SHA3_256.Digest_Type,
           Digest_Size_Bits           => 256,
           FIPS202_Algorithm_Callback =>
              Keccak_more_compact_c.FIPS202_SHA3_256'access);
      --  XKCP SHA3_256 algorithm

      function FIPS202_SHA3_384_Thick_Binding is new
        FIPS202_Generic_Thick_Binding
          (Return_Digest_Type         => SHA3.SHA3_384.Digest_Type,
           Digest_Size_Bits           => 384,
           FIPS202_Algorithm_Callback =>
              Keccak_more_compact_c.FIPS202_SHA3_384'access);
      --  XKCP SHA3_384 algorithm

      function FIPS202_SHA3_512_Thick_Binding is new
        FIPS202_Generic_Thick_Binding
          (Return_Digest_Type         => SHA3.SHA3_512.Digest_Type,
           Digest_Size_Bits           => 512,
           FIPS202_Algorithm_Callback =>
              Keccak_more_compact_c.FIPS202_SHA3_512'access);
      --  XKCP SHA3_512 algorithm

      SHA3_224_XKXP_Result : SHA3.SHA3_224.Digest_Type :=
        FIPS202_SHA3_224_Thick_Binding (Data => GNATfuzz_Data);
      --  Call the XKXP compact SHA3 224bit hash function

      SHA3_256_XKXP_Result : SHA3.SHA3_256.Digest_Type :=
        FIPS202_SHA3_256_Thick_Binding (Data => GNATfuzz_Data);
      --  Call the XKXP compact SHA3 256bit hash function

      SHA3_384_XKXP_Result : SHA3.SHA3_384.Digest_Type :=
        FIPS202_SHA3_384_Thick_Binding (Data => GNATfuzz_Data);
      --  Call the XKXP compact SHA3 384bit hash function

      SHA3_512_XKXP_Result : SHA3.SHA3_512.Digest_Type :=
        FIPS202_SHA3_512_Thick_Binding (Data => GNATfuzz_Data);
      --  Call the XKXP compact SHA3 512bit hash function

...

The actual test then needs to be updated to ensure all of the SHA3 algorithms are called, as follows:

   ---------------
   -- SHA3_Test --
   ---------------

   function SHA3_Test(Data : in Keccak.Types.Byte_Array)
                      return SHA3_Digests
   is
      SHA3_Ctx_224   : SHA3.SHA3_224.Context;
      SHA3_Ctx_256   : SHA3.SHA3_256.Context;
      SHA3_Ctx_384   : SHA3.SHA3_384.Context;
      SHA3_Ctx_512   : SHA3.SHA3_512.Context;

      Return_Value   : SHA3_Digests;

   begin

      -- SHA3_224
      SHA3.SHA3_224.Init   (SHA3_Ctx_224);
      SHA3.SHA3_224.Update (SHA3_Ctx_224, Data);
      SHA3.SHA3_224.Final  (SHA3_Ctx_224, Return_Value.SHA3_224_Digest);

      -- SHA3_256
      SHA3.SHA3_256.Init   (SHA3_Ctx_256);
      SHA3.SHA3_256.Update (SHA3_Ctx_256, Data);
      SHA3.SHA3_256.Final  (SHA3_Ctx_256, Return_Value.SHA3_256_Digest);

      --  SHA3_384
      SHA3.SHA3_384.Init   (SHA3_Ctx_384);
      SHA3.SHA3_384.Update (SHA3_Ctx_384, Data);
      SHA3.SHA3_384.Final  (SHA3_Ctx_384, Return_Value.SHA3_384_Digest);

      --  SHA3_512
      SHA3.SHA3_512.Init   (SHA3_Ctx_512);
      SHA3.SHA3_512.Update (SHA3_Ctx_512, Data);
      SHA3.SHA3_512.Final  (SHA3_Ctx_512, Return_Value.SHA3_512_Digest);

      return Return_Value;

   end SHA3_Test;

Note that the return type is now a simple record to hold all of the calculated cryptographic digests:

   type SHA3_Digests is record
      SHA3_256_Digest   : SHA3.SHA3_256.Digest_Type;
      SHA3_224_Digest   : SHA3.SHA3_224.Digest_Type;
      SHA3_384_Digest   : SHA3.SHA3_384.Digest_Type;
      SHA3_512_Digest   : SHA3.SHA3_512.Digest_Type;
   end record;
   --  Hold the results of applying the different SHA3 algorithms 
   --  on the byte array

Execution of the differential fuzzing campaign now allows us to compare the results of all of the SHA3 cryptographic algorithms for every data mutation, with a minimal performance overhead.

So, to get back to the original question: "Can differential fuzz-testing increase the functional correctness assurance of Libkeccak?" we need to consider the assurance level achieved through Dan's original work. As stated, the Libkeccak verification suite is made up of 35,982 test vectors and 146 unit tests. Therefore our fuzzing campaign should include a target stop rule that complements and improves on these figures. As a demonstration of the capabilities of differential fuzzing we set the stopping rule at 10 times the original test vector size per fuzzing node. This amounts to 360 thousand executions per each of the 15 processor cores available on the test hardware architecture.

<rule>
   <state>execs_done</state>
   <operator>gt</operator>
   <value>360000</value>
   <all_sessions>true</all_sessions>
</rule>

This amounts to over 5 million test executions. However, as each processing node is allowed to continue until all of the processing nodes have executed more than 36 thousand mutations, the actual figure will be much higher. This seems like a reasonable demonstration that we've added assurance to the library!

The fuzzing campaign was then executed and the following results observed:

Libkeccak/XKCP Differential Fuzzing Campaign Results

Stat	Value
Total executions	11,340,002
Total comparison of Libkeccak and XKCP algorithm outputs	45,360,008
Total time to satisfy stopping criteria	8 minutes
Average executions per second	35,886
Unique paths found	53
Cores utilised	14
Test crashes detected	0
Hung tests detected	0
Differences between Libkeccak and XKCP detected	0

To conclude: after over 11 million test executions through the Libkeccak and XKCP SHA3 cryptographic algorithms, GNATfuzz did not find a single bug. As Libkeccak was developed in SPARK and verified to the Silver level, it was already proven to be absent of runtime errors. In addition, and through the assurance provided by the original Libkeccak verification suite, it had an arguably reasonable proof that the algorithms are functionally correct. However, the additional evidence provided by this differential fuzzing campaign bolsters that claim and allows us to conclude that differential fuzzing can increase software assurance levels.

There is also more that we could do to raise the assurance further:

Run the campaign for longer
Run the campaign on more cores
Utilise the GNATfuzz SHA_Test.GNATFuzz_User_Configuration.User_Provided_Seeds procedure to create a more interesting starting corpus, that varies the length of the byte arrays

In addition, it would be interesting to run the test again but this time use the branch of XKCP that contains the vulnerability Dan mentioned, and see how long it takes the fuzzer to identify CVE-2022-37454.

But that's for another day!

If you're interested in learning more about GNATfuzz, differential fuzzing or just fuzz testing in general please checkout our GNATfuzz webpage or get in touch with me directly.

Posted in #Fuzzing #SPARK #Testing #Security

About Paul Butcher

Paul is the UK Programme Manager, Head of Dynamic Analysis for AdaCore, and the Lead Engineer for GNATfuzz. He has over 25 years of experience in developing and verifying embedded safety-critical real-time systems. Before joining AdaCore, Paul was a consultant engineer, working for UK aerospace companies such as Leonardo Helicopters, BAE Systems, Thales UK, and QinetiQ. Before becoming a consultant, Paul worked as a Software Developer and Safety Engineer for the Typhoon platform, safety-critical automated train driving software, military UAVs, the Tactical Processor for the Wildcat platform, and mission planning systems for Typoon, EH101, and Wildcat. Paul graduated from the University of Portsmouth with a Bachelor’s Degree with Honours in Computing and a Higher National Diploma in Software Engineering.