Secure Use of Cryptographic Libraries: SPARK Binding for Libsodium

by Isabelle Vialard – Sep 03, 2019

The challenge faced by cryptography APIs is to make building functional and secure programs easy for the user. Even with good documentation and examples, this remains a challenge, especially because incorrect use is still possible. I made bindings for two C cryptography libraries, TweetNaCl (pronounce Tweetsalt) and Libsodium, with the goal of making this binding easier to use than the original API by making it possible to detect automatically a large set of incorrect uses. In order to do this, I did two bindings for each library: a low-level binding in Ada, and a higher level one in SPARK, which I call the interface. I used Ada strong-typing characteristics and SPARK proofs to enforce a safe and functional use of the subprograms in the library.

In this post I will explain the steps I took to create these bindings, and how to use them.

Steps to create a binding

I will use one program as example: Crypto_Box_Easy from Libsodium, a procedure which encrypts a message.

At first I generated a binding using the Ada spec dump compiler:

gcc -c -fdump-ada-spec -C ./sodium.h

Which gives me that function declaration:

function crypto_box_easy
     (c : access unsigned_char;
      m : access unsigned_char;
      mlen : Extensions.unsigned_long_long;
      n : access unsigned_char;
      pk : access unsigned_char;
      sk : access unsigned_char) return int  -- ./sodium/crypto_box.h:61
   with Import => True, 
        Convention => C, 
        External_Name => "crypto_box_easy";

Then I modified this binding: First I changed the types used and I added in and out parameters. I removed the access parameter. Scalar parameters with out mode are passed using a temporary pointer on the C side, so it works even without explicit pointers. For unconstrained arrays like Block8 it is more complex. In Ada unconstrained arrays are represented by what is called fat pointers, that is to say a pointer to the bounds of the array and the pointer to the first element of the array. In C the expected parameter is a pointer to the first element of the array. So a simple binding like this one should not work. What saves the situation is this line which forces passing directly the pointer to the first element:

with Convention => C;

Thus we go from a low-level language where the memory is indexed by pointers to a typed language like Ada:

function crypto_box_easy
     (c    :    out Block8;
      m    : in     Block8;
      mlen : in     Uint64;
      n    : in out Block8;
      pk   : in     Block8;
      sk   : in     Block8) return int  -- ./sodium/crypto_box.h:61
   with Import => True,
        Convention => C,
        External_Name => "crypto_box_easy";

After this, I created an interface in SPARK that uses this binding: the goal is to make the same program as in the binding with some modifications. Some useless parameters will be deleted. For instance the C program often asks for an array and the length of this array (like m and mlen), which is useless since the length can be found with the attribute 'Length. Functions with out parameters must be changed into procedures to comply with SPARK rules. Finally new types can be created to take advantage of strong typing, as well as preconditions and postconditions.

procedure Crypto_Box_Easy
     (C  :    out Cipher_Text;
      M  : in     Plain_Text;
      N  : in out Box_Nonce;
      PK : in     Box_Public_Key;
      SK : in     Box_Secret_Key)
   with
       Pre => C'Length = M'Length + Crypto_Box_MACBYTES
       and then Is_Signed (M)
       and then Never_Used_Yet (N);

How to use strong typing, and why

Most of the parameters required by these programs are arrays. Some arrays for messages, others for key, etc. The type Block8 could be enough to represent them all:

type Block8 is array (Index range <>) of uint8;

But then anyone could use a key as a message, or a message as a key. To avoid that, in Libsodium I derived new types from Block8. For instance, Box_Public_Key and Box_Secret_Key are the key types used by the Crypto_Box_* programs. The type for messages is Plain_Text, and the type for messages after encryption is Cipher_Text. Thus I take advantage of Ada's strong-typing characteristics in order to enforce the right use of the programs and their parameters.

With TweetNaCl, I did things a bit differently: I created the different types directly in the binding, for the same result. Since TweetNaCl is a very small library, it was faster that way. In Libsodium I chose to let my first binding stay as close as possible to the generated one, and to focus on the interface where I use strong-typing and contracts (preconditions and postconditions).

Preconditions and postconditions

Preconditions and postconditions serve the same purpose as derived types: they enforce a specific use of the programs. There are two kinds of conditions:

The first kind are mostly conditions on array's length. They are here to ensure the program will not fail. For instance:

Pre => C'Length = M'Length + Crypto_Box_MACBYTES;

This precondition says that the cipher text should be exactly Crypto_Box_MACBYTES bytes longer than the message that we want to encrypt. If this condition is not filled then execution will fail. Note that we can reference C'Length in the precondition, even though C is an out parameter, because the length attribute of an out parameter that is of an array type is available when the call is made, so we can reason about it in our precondition.

The other kind of conditions is used to avoid an unsafe use of the programs. For instance, Crypto_Box uses a Nonce. A Nonce is a small array used as a complement to a key. In theory, to be safe, a key should be long, and used only for one message. However, it is costly to generate a new long key for each message. So we use a long key for every message with a Nonce which is different for each message, but easy to generate. Thus the encryption is safe, but only if we remember to use a different Nonce for each message. To ensure that, I wrote this precondition:

Pre => Never_Used_Yet (N)

Never_Used_Yet is a ghost function, it means a function that doesn't affect the program's behavior.

type Box_Nonce is limited private;
function Never_Used_Yet (N: Box_Nonce) return Boolean with Ghost;

When GNATprove is used, it sees that procedure Randombytes (Box_Nonce: out N), the procedure that randomly generates the Nonce, has the postcondition Never_Used_Yet (N). So it deduces that when the Nonce is first generated, Never_Used_Yet (N) is true. Thus the first time N is used by Crypto_Box_Open, the precondition is valid. But if N is used a second time, GNATprove cannot prove Never_Used_Yet (N) is still true (because N as the parameter in out, so it could have been changed). That's why it cannot prove a program that calls Crypto_Box_Open twice with the same Nonce:

GNATprove returns this when a Nonce is used on more than one message

There are ways around this condition: for instance one could copy a random generated Nonce many times to use it on different message. To avoid that, Box_Nonce is declared as limited private: it cannot be copied and GNATprove cannot prove a copied Nonce has the same Never_Used_Yet property as a generated one.

Box_Nonce and Never_Used_Yet are declared in the private part of the package with SPARK_Mode Off so that GNATprove treats them as opaque entities:

private
   pragma SPARK_Mode (Off);

   type Box_Nonce is new Block8 (1 .. Crypto_Box_NONCEBYTES);

   function Never_Used_Yet (N : Box_Nonce) return Boolean is (True);

Never_Used_Yet always returns true. It is a fake implementation, what matters is that it is hidden for the proof. It works at runtime because it is always used as a positive condition. As a program requirement it is always used as "Never_Used_Yet (N)" and never "not Never_Used_Yet (N)" so the conditions are always valid, and Never_Used_Yet doesn't affect the program's behavior, even if contracts are executed at runtime.

Another example of preconditions made with a ghost function is the function Is_Signed (M : Plain_Text). When you want to send an encrypted message to someone, you want this person to be able to check if this message is from you, so no one will be able to steal your identity. To do this, you have to sign your message with Crypto_Sign_Easy, before encrypting it with Crypto_Box_Easy. Trying to skip the signing step leads to a proof error:

GNATprove returns this when an unsigned message is encrypted

How to use Libsodium_Binding

The repository contains:

• The project file libsodium.gpr

• The library directory lib

• The common directory which contains:

◦ The libsodium_binding package, a low-level binding in Ada made from the files generated using the Ada spec dump compiler.

◦ The libsodium_interface package, a higher level binding in SPARK which uses libsodium_binding.

• include, a directory which contains the headers of libsodium.

• libsodium_body, a directory which contains the bodies.

• outside_src, a directory which contains the headers that were removed from include, to fix a problem of double definition.

• thin_binding, a directory which contains the binding generated using the Ada spec dump compiler.

• The test directory which contains tests for each group of functions.

• A testsuite which verifies the same tests as the ones in the test directory.

• The examples directory, with examples that use different groups of programs together. It also contains a program where a Nonce is used twice, and as expected it fails at proof stage.

outside_src and thin_binding are not used for the binding, but I let them in the repository because it shows what I changed from the original libsodium sources and the Ada generated binding.

This project is a library project so directory lib is the only thing necessary.

How to use TweetNaCl_Binding

The repository contains:

• The project file tweetnacl.gpr

• The common directory which contains:

◦ The tweetnacl_binding package, a low-level binding in Ada made from the files generated using the Ada spec dump compiler.

◦ The tweetnacl_interface package, a higher level binding in SPARK which uses tweetnacl_binding.

◦ tweetnacl.h and tweetnacl.c, the header and the body of the library

◦ randombytes.c, which holds randombytes, a program to generate arrays.

• The test directory: test1 and test1b are functional examples of how to use tweetnacl main programs, the others are examples of what happens if you give an array with the wrong size, if you try to use the same nonce twice etc. They fail either at execution or at proof stage.

To use this binding, you just have to include the common directory in the Sources of your project file.

Posted in #SPARK #Security #Cryptography #Binding

About Isabelle Vialard

Isabelle is a student at École Poytechnique, working on binding C cryptographic libraries in SPARK as part of a summer internship at AdaCore.