The End of Binary Protocol Parser Vulnerabilities

by Fabien Chouteau – Mar 28, 2023

This week we announced a new tool RecordFlux. The goal of RecordFlux is to address one of the most critical parts of the software stack in terms of security, binary protocol parsers/serializers.

From a protocol specification written in the RecordFlux Domain Specific Language (DSL), the tool can generate provable SPARK code. This means memory safety (no buffer overruns), absence of integer overflow errors, and even proof of functional properties. In this blog post I will try to explain how this is a game changer for cybersecurity.

We will invent a trivial protocol for a remote service to push opaque data (i.e., “raw” bytes that will be appropriately interpreted by application code) onto a stack in a remote server, and then see how you can use RecordFlux to generate binary communication protocol parsers/synthesizes and prove the correct usage with SPARK. Our trivial protocol will be aptly named Remote Stack Protocol (RSP).

The examples I am going to show here require recent GNAT Pro compiler and SPARK Pro releases as RecordFlux is using bleeding edge features from both technologies.

Protocol specification: request messages

In this post we will look at the request side of things, i.e. messages that the client will send to the server. We have to specify the structure of the messages using the RecordFlux DSL.

First we define an enumeration type that represents the different kinds of requests we want to support. We have two kinds of requests:

Request_Store to store data on the stack (encoded in 8 bits as the value 21)
Request_Get to retrieve data from the stack (analogously encoded as 254)

type Request_Kind is (Request_Store => 21,
                      Request_Get   => 254)
   with Size => 8;

As you can see, we can also specify the size of the type in bits.

Then we define a type for the length of the payload that will be included in the message:

type Payload_Length is range 0 .. 255
   with Size => 8;

Finally we can define the format of request messages:

type Request_Message is
   message
      Kind   : Request_Kind
         then Length if Kind = Request_Store;

      Length : Payload_Length
         then Payload with Size => Length * 8;

      Payload : Opaque;
   end message;

The message contains three fields, two of which are conditional, based on the kind of request. The "store" request has a payload while the "get" request doesn’t.

This message variation is defined with the "then .. if .." construct. The "Kind" field is followed by the "Length" field if and only if the kind of request is "Request_Store".

The "Length" field is followed by a "Payload" field whose size (in bits) is specified by the value of "Length" itself.

The "Payload" field is specified as "Opaque", meaning that it is only being viewed as a sequence of bytes. RecordFlux provides a message refinement facility for treating such data as typed, but our protocol carries anonymous data.

Here’s what the full specification looks like, in a file named rsp.rflx:

package RSP is

   type Request_Kind is (Request_Store    	=> 21,
                     	 Request_Get      	=> 254)
  	with Size => 8;

   type Payload_Length is range 0 .. 255
  	with Size => 8;

   type Request_Message is
      message
         Kind : Request_Kind
            then Length if Kind = Request_Store;

         Length  : Payload_Length
            then Payload with Size => Length * 8;

         Payload : Opaque;

      end message;

end RSP;

Just like that, we’ve defined a variable length message.

Checking the specification and generating code

Now that we have the first version of our protocol specification, we can use the RecordFlux command line tool to check it and generate a parser and serializer for us:

$ mkdir generated
$ rflx generate -d generated/ rsp.rflx

The tool will first check if our specification is correct, then generate Ada/SPARK code (in the ‘generated’ directory) to parse and synthesize messages.

Among the things that rlfx will check are:

Field sizes are large enough for the data they contain
There are no cycles or ambiguities in our message
All fields are reachable and don’t overlap in unintended ways
The message has no holes or unused fields

The generated code consists of a set of packages that make up the RecordFlux infrastructure, together with packages that implement the message structure that we have just defined.

Let’s see how to use the generated code to craft a binary message.

Making a request

The most interesting part of the code that was generated is in the "RFLX.RSP.Request_Message" package.

First we need a "Context" object, this will handle the formatting of the messages. The context itself needs a buffer to build the message into. Planning ahead, since we want to send a message that has four bytes, and since the "Kind" and "Length" fields are one byte each, we will need to reserve six bytes for the buffer.

declare
   Buffer  : RFLX.RFLX_Types.Bytes_Ptr := new RFLX.RFLX_Types.Bytes'(0, 0, 0, 0, 0, 0);
   Context : Request_Message.Context;
begin
   Request_Message.Initialize (Context, Buffer);

Now we can start to craft our message, starting with the "Kind" field:

Request_Message.Set_Kind (Context, Request_Store);

"Request_Store" messages have a "Length" field so this is what we set next. For this example we want to send 4 bytes:

Request_Message.Set_Length (Context, 4);

And finally we set the payload:

Request_Message.Set_Payload (Context, (1, 2, 3, 4));

And that’s it! We have our request. We can now retrieve the buffer from the Context object and print it to see the contents:

Request_Message.Take_Buffer (Context, Buffer);

for Elt of Buffer.all loop
   Byte_IO.Put (Elt, Base => 16);
   Ada.Text_IO.Put (' ');
end loop;
Ada.Text_IO.New_Line;

And here’s the result:

16#15# 16#4# 16#1# 16#2# 16#3# 16#4#

A 6 bytes message, with the request kind, the length and the payload.

So what?

In just a couple of lines of RecordFlux and Ada code we crafted a message specification for our request/response protocol, and we generated a packet. But if you stop here, you are missing what makes RecordFlux a game changer.

First, the generated parser/serializer is provable SPARK code, which means that it is mathematically proved to be free of vulnerabilities such as buffer overflows, division by zero, integer overflows. It is also proved to be consistent with its specifications. Everyone can see the value of such strong guaranties for cybersecurity.

By using SPARK, not only the parser/serializer can be proved, but also the code using it. Let’s enable SPARK mode in our example. Now if we run the gnatprove tool, the proof will pass as our code is correct so far. But what if the protocol changes? Let’s introduce a new field in our request message and see what happens.

From now on the request message should contain a "stack_id" field that identifies one of 10 stacks where the data can be stored:

type Stack_Identifier is range 0 .. 9
   with Size => 8;

type Request_Message is
   message
      Stack_Id : Stack_Identifier;

      Kind : Request_Kind
         then Length if Kind = Request_Store;

      Length  : Payload_Length
         then Payload with Size => Length * 8;

      Payload : Opaque;

   end message;

If our code stays the same we are in big trouble. The "stack_id" field is not initialized, and we don’t know what its value will be in our message…

Thanks to SPARK and formal verification, this problem will be detected:

example.adb:35:19: medium: precondition might fail, cannot prove Valid_Predecessor (Ctx, Fld)
   35 |   Request_Message.Set_Kind (Context, Request_Store);
      |   ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  in inlined expression function body at rflx-rsp-request_message.ads:921

“Cannot prove Valid_Predecessor (Ctx, Fld)”: this precondition checks that the fields of the message are assigned in the correct order and that they are all filled. Here the new "stack_id" field should be set before the "kind" field, therefore this code cannot be proven against the specifications. We have to call the "Set_Stack_Id" procedure for the proof to pass.

What if I am not ready to switch to formal verification?

Well you are in luck! Because one of the unique properties of SPARK is that the contracts are all executable Ada code. When assertions are enabled, the compiler will generate code to check those contracts at run-time, and therefore any misuse of the protocol will trigger an exception.

With our example here if we enable assertions with the compiler switch "-gnata", and then run the application, we get this output:

raised ADA.ASSERTIONS.ASSERTION_ERROR : failed precondition from rflx-rsp-request_message.ads:487

Typically you would enable those run-time checks during the validation of the software (continuous integration, testing campaigns, etc.). But it’s also possible to enable them in production code for uncompromising safety. However keep in mind that there is a performance penalty in this case, those extra checks added by the compiler are executed by the CPU at run-time. So this is not as good as SPARK, whose static analysis provides uncompromising safety and uncompromising performance, but it is still a large improvement compared to using handmade protocol parser or generated parsers in another language.

Conclusion

I hope this very short introduction to RecordFlux sparked your interest in this technology. If you have any questions, please feel free to ask in the comment section below, or use the contact form on our website.

In a followup post I will cover some more features of the technology, in particular state machines, and implement a server for our little protocol with a maximum amount of generated SPARK code. In the meantime you can have a look at the code of this example here.

Posted in

About Fabien Chouteau

Fabien joined AdaCore in 2010 after his engineering degree at the EPITA (Paris). He is involved in real-time, embedded and hardware simulation technology. Maker/DIYer in his spare time, his projects include electronics, music and woodworking.