# Proving properties of constant-time crypto code in SPARKNaCl

## Introduction #

Over the last few months, I developed a SPARK version of the TweetNaCl cryptographic library. This was made public on GitHub here in February 2020, under the 2-clause BSD licence. TweetNaCl has achieved widespread popularity and adoption and is used to underpin many other projects. The JavaScript version of the library is used by many web sites, with the NPM package manager reporting over 14 Million downloads per week.

The README file at the top-level in that repository contains most of the background information that you'll need to build and use SPARKNaCl. The code has been proven "type safe" (aka "no run-time errors" or the "Silver" level of SPARK adoption) with the Community 2019 edition of SPARK.

This blog entry goes into a bit more technical detail on one particular aspect of the project: the challenge of re-writing and verifying "constant time" algorithms using SPARK. This is something of a big deal in crypto code. Some algorithms are known to "leak" information because their running time varies with respect to sensitive variables, like cryptographic keys, so a "Constant Time" programming style is adopted to prevent this kind of side-channel attack.

TweetNaCl is something of a masterclass in compact programming - it meets the unusual goal that the entire source code of the library can be transmitted in under 100 tweets. Naturally, this leads to a somewhat economical coding style, and a complete lack of comments in the code. A paper on the authors' website explains some of it, but leaves a lot of detail either unstated or assumed.

SPARKNaCl aims to reverse this situation, proving an alternative reference implementation of the NaCl API that is well-documented, provably free of runtime errors, and preserves the "constant time" nature of all the algorithms.

This blog entry concentrates on just one of the internal functions of TweetNaCl - a function called "Pack_25519".

Let's start with a look at the original C implementation, starting with a little background information on the types and data that we're working with.

TweetNaCl defines a few types and macros that we need to know about - namely:

#define FOR(i,n) for (i = 0; i < n; ++i)
#define sv static void
typedef unsigned char u8;
typedef long long i64;
typedef i64 gf[16];

In particular, the type "gf" is an array of 16 64-bit integers. This is actually used to represent a 256-bit "big integer" where each digit or "limb" is 16 bits, so in the range 0 .. 65535. (64 bits are used so these digits can "overflow" during intermediate computations...) A gf value where all the limbs are really in the range 0 .. 65535 we'll call a "Normal GF".

You also need to know that NaCl implements an Elliptic Curve called "Curve25519" where the "25519" is a reference to the modulus of the curve coordinates, which is actually 2**255 - 19 which (from now on) we will call P. The gf type represents such a value in little-endian format, so the limb at array index 0 represents the least-significant digit.

In the C code, there's a function called "pack25519" with declaration:

sv pack25519(u8 *o, const gf n);

A bit of reading and digging reveals that the parameter o is supposed to be pointing at an uninitialized array of 32 bytes, which is assigned in the function based on the value of the parameter n.

The TweetNaCl paper says that the purpose of Pack25519 is to "freeze integer mod 2**255 − 19 and store".

This gives a bit of a hint. Note that a Normal GF value stores sixteen 16-bit values, which is 256 bits (so representing integers in the range 0 .. 2**256 - 1) but that P is slightly less than half that upper bound. The point of Pack25519, then, is to take a Normal GF value (which might represent an integer greater than or equal to P) and reduce it modulo P, then "pack" the resulting value into 32 8-bit bytes.

This gives us enough to write a first specification for Pack_25519 in SPARK. In SPARK, it is perfectly normal for functions to return a composite type (such as an array) so it seems natural to use a function, not a procedure. We'll also need to introduce a few types as follows. For a longer explanation of the constants and how they are derived, see the SPARKNaCl sources in sparknacl.ads. For the complete source code for Pack_25519, see file sparknacl-utils.adb.

subtype I32 is Integer_32;
subtype N32 is I32 range 0 .. I32'Last;
subtype I64 is Integer_64;

subtype Index_32 is I32 range 0 .. 31;

type Byte_Seq is array (N32 range <>) of Byte;
subtype Bytes_32 is Byte_Seq (Index_32);

--  "LM"   = "Limb Modulus"
--  "LMM1" = "Limb Modulus Minus 1"
LM   : constant := 65536;
LMM1 : constant := 65535;
--  "R2256" = "Remainder of 2**256 (modulo 2**255-19)"
R2256 : constant := 38;

--  "Maximum GF Limb Coefficient"
MGFLC : constant := (R2256 * 15) + 1;

--  "Maximum GF Limb Product"
MGFLP : constant := LMM1 * LMM1;

subtype GF_Any_Limb is I64 range -LM .. (MGFLC * MGFLP);

type GF is array (Index_16) of GF_Any_Limb;

subtype GF_Normal_Limb is I64 range 0 .. LMM1;

subtype Normal_GF is GF
with Dynamic_Predicate =>
(for all I in Index_16 => Normal_GF (I) in GF_Normal_Limb);


Note that Normal_GF is a subtype of GF and uses Ada's "Dynamic_Predicate" aspect to constrain the value of each limb.

Finally, we can introduce a SPARK specification for Pack_25519:

--  Reduces N modulo (2**255 - 19) then packs the
--  value into 32 bytes little-endian.
function Pack_25519 (N : in Normal_GF) return Bytes_32
with Global => null;


## Modular reduction in variable time #

A "variable time" algorithm for modular reduction is easy - you just subtract P from the value of N until the result lies in 0 .. P-1 and you're done. But we're not allowed to do that. Try again!

A useful observation is that the value N can only lie in one of three interesting ranges. It might be in 0 .. P-1 already, in which case there's nothing to be done. It might lie in P .. 2P-1, in which case we can just subtract P once. Finally, if it lies in 2P .. 2**256-1, then we subtract P twice and return the result. Unfortunately, that isn't constant time either, so no good.

## Now in constant time... #

Let's have a look at how the TweetNaCl C code does it:

sv pack25519 (u8 *o, const gf n)
{
int i, j, b;
gf m, t;
FOR(i,16) t[i]=n[i];
car25519(t);
car25519(t);
car25519(t);
FOR(j, 2) {
m[0]=t[0]-0xffed;
for(i=1;i<15;i++) {
m[i]=t[i]-0xffff-((m[i-1]>>16)&1);
m[i-1]&=0xffff;
}
m[15]=t[15]-0x7fff-((m[14]>>16)&1);
b=(m[15]>>16)&1;
m[14]&=0xffff;
sel25519 (t, m, 1-b);
}
FOR(i, 16) {
o[2*i]=t[i]&0xff;
o[2*i+1]=t[i]>>8;
}
}


Eh? What on earth is going on here? Good question! Let's break it down a bit and look at each of the three outer FOR loops:

1. The first FOR is just an array assignment from n to t. No problem in SPARK.
2. The third FOR is taking sixteen 16-bit values in t and assigning them onto 32 8-bit values in o in "little-endian" format, so that's just a type-conversion function.
3. The middle FOR is interesting! It loops exactly twice. The body of the loop first subtracts 0xffed from limb 0 of t, then subtracts 0xffff from limbs 1 through 14 (with carry and reducing each limb mod 65536), then finally subtracts 0x7fff from limb 15. So...this is subtracting the little-endian representation of P from t.

Don't worry about those three calls to "car25519" - we'll come back to those.

## Diversion - Constant Time Conditional Swapping #

What about that call to "sel25519" at the end of the loop body. What does that do? The TweetNaCl paper says this is a "constant time conditional swap" for GF values. The C function prototype looks like this:

sv sel25519 (gf p, gf q, int b);

which doesn't help much. Note that "gf" is an array type, so parameter p and q are by-reference (so probably "in out" in Ada terms). The parameter b is a "int" here but is only ever called with values 0 or 1, so probably being interpreted as a Boolean value. This gives is a reasonable chance of writing a SPARK specification for it. I have changed the name to "CSwap" which seems more readable:

--  Constant time conditional swap of P and Q.
procedure CSwap (P    : in out GF;
Q    : in out GF;
Swap : in     Boolean)
with Global => null,
Contract_Cases =>
(Swap     => (P = Q'Old and Q = P'Old)
not Swap => (P = P'Old and Q = Q'Old));


The "Contract_Cases" aspect here says exactly what the procedure does - swapping P and Q (or not) depending on the value of Swap. It might be tempting to write the body trivially:

if Swap then
Temp := P;
P := Q;
Q := Temp;
end if;

but that won't do, since we require the running time to be constant, not even depending on the value of Swap. Let's see how the C code works:

sv sel25519 (gf p, gf q, int b)
{
i64 t, i, c = ~(b-1);
FOR(i, 16) {
t= c&(p[i]^q[i]);
p[i]^=t;
q[i]^=t;
}
}


Once again, this is less than obvious, but becomes clear when you consider the properties of the bitwise "xor" operator (^ in C), and the way the local variable c is initialised.

c is initialised to be "~(b-1)" where b can be only 0 or 1. If b is 0, then b-1 is -1. Bitwise "not" of -1 yields 0. Similarly, if b is 1, then c is initialised to -1 or (more usefully "all 1s" in binary).

This is best handled in SPARK with a constant look-up table:

type Bit_To_Swapmask_Table is array (Boolean) of U64;
(False => 16#0000_0000_0000_0000#,
True  => 16#FFFF_FFFF_FFFF_FFFF#);


Translating and proving the body in SPARK reveals a slightly ugly side to the C version - it uses bitwise operators (like ~ and ^) on signed integers. This is implementation-defined in C since the semantics fall back on whatever representation of the integers happens to be in use. In SPARK, there's no equivalent, so we're forced to use instantiations of Unchecked_Conversion to switch from I64 to U64 and back again. We also need to introduce an axiom for proof:

pragma Assume
(for all K in I64 => To_I64 (To_U64 (K)) = K);

Writing the body of CSwap in SPARK is then reasonably easy. For proof, we need a loop invariant - this states that after I loop iterations, the first I elements of P and Q have been swapped (or not...) It comes out like this:

procedure CSwap (P    : in out GF;
Q    : in out GF;
Swap : in     Boolean)
is
T : U64;
C : U64 := Bit_To_Swapmask (Swap);
begin
for I in Index_16 loop
T := C and (To_U64 (P (I)) xor To_U64 (Q (I)));
P (I) := To_I64 (To_U64 (P (I)) xor T);
Q (I) := To_I64 (To_U64 (Q (I)) xor T);

pragma Loop_Invariant
(if Swap then
(for all J in Index_16 range 0 .. I =>
(P (J) = Q'Loop_Entry (J) and
Q (J) = P'Loop_Entry (J)))
else
(for all J in Index_16 range 0 .. I =>
(P (J) = P'Loop_Entry (J) and
Q (J) = Q'Loop_Entry (J)))
);
end loop;
end CSwap;


I hope you can see how it works now. Remember that X xor 0 = X and that X xor X = 0.

Pleasingly, GNATprove discharges all the VCs for this subprogram with no further assistance. The full version of CSwap in SPARKNaCl also sanitizes the local variables T and C (i.e. erases their values from memory) - another strange and tricky idiom to master in crypto programming. This explains why C is declared as a variable and not a constant here.

## Subtracting P safely #

Having established what sel25519 (and its SPARK equivalent CSwap) is all about, we can now return to the Pack_25519 function. The key realisation is that the outer (second) loop always subtracts P from t twice, and then uses sel25519 to "swap" the right answer into t so it can be assigned to o.

Another key discovery is that the decision to swap (or not) is based on whether each subtraction did or didn't underflow (i.e. return a negative result). Imagine the case where N in is 0 .. P-1. If you subtract P from that then the result will be negative, telling you that you can discard the result of the subtraction and the answer you really want is N.

This means we need a "subtract P" procedure that returns both the result of the subtraction and a Boolean "Underflow" flag. The result might be negative so we use a GF subtype with a Dynamic_Predicate again, plus a pre- and post-condition to make sure we can apply Subtract_P more than once:

--  Subtracting P twice from a Normal_GF might result
--  in a GF where limb 15 can be negative with lower bound -65536
subtype Temp_GF_MSL is I64 range -LM .. LMM1;
subtype Temp_GF is GF
with Dynamic_Predicate =>
(Temp_GF (15) in Temp_GF_MSL and
(for all K in Index_16 range 0 .. 14 =>
Temp_GF (K) in GF_Normal_Limb));

procedure Subtract_P (T         : in     Temp_GF;
Result    :    out Temp_GF;
Underflow :    out Boolean)
with Global => null,
Pre    => T (15) >= -16#8000#,
Post   => (Result (15) >= T (15) - 16#8000#);


The body of Subtract_P is basically a simple translation of the C code, but with a suitable loop invariant, which is sufficient to establish the absence of run-time errors. We also need one more subtype, thus:

subtype I64_Bit is I64 range 0 .. 1;

procedure Subtract_P (T         : in     Temp_GF;
Result    :    out Temp_GF;
Underflow :    out Boolean)
is
Carry : I64_Bit;
R     : GF;
begin
R := (others => 0);

--  Limb 0 - subtract LSL of P, which is 16#FFED#
R (0) := T (0) - 16#FFED#;

--  Limbs 1 .. 14 - subtract FFFF with carry
for I in Index_16 range 1 .. 14 loop
Carry     := ASR_16 (R (I - 1)) mod 2;
R (I)     := T (I) - 16#FFFF# - Carry;
R (I - 1) := R (I - 1) mod LM;

pragma Loop_Invariant
(for all J in Index_16 range 0 .. I - 1 =>
R (J) in GF_Normal_Limb);
pragma Loop_Invariant (T in Temp_GF);
end loop;

--  Limb 15 - Subtract MSL (Most Significant Limb)
--  of P (16#7FFF#) with carry.
--  Note that Limb 15 might become negative on underflow
Carry  := ASR_16 (R (14)) mod 2;
R (15) := (T (15) - 16#7FFF#) - Carry;
R (14) := R (14) mod LM;

--  Note that R (15) is not normalized here, so that the
--  result of the first subtraction is numerically correct
--  as the input to the second.
Underflow := R (15) < 0;
Result    := R;
end Subtract_P;


## Back to Pack_25519 #

Finally, we're in a position to write the body of Pack_25519. It's rather simple now:

function Pack_25519 (N : in Normal_GF) return Bytes_32
is
L      : GF;
R1, R2 : Temp_GF;
First_Underflow  : Boolean;
Second_Underflow : Boolean;
begin
L := N;
Subtract_P (L,  R1, First_Underflow);
Subtract_P (R1, R2, Second_Underflow);
CSwap (R1, R2, Second_Underflow);
CSwap (L,  R2, First_Underflow);
end Pack_25519;

The trick is to arrange the calls to CSwap to always put the correct answer into R2. Consider the three cases we mentioned above.

1. if N in 0 .. P-1, then the first subtraction will underflow, and so will the second. In that case, the second call to CSwap puts L into R2, which is the right answer - the original value of N.
2. if N in P .. 2P-1, then the first subtraction will not underflow, but the second will. The first call to CSwap puts R1 into R2 (which is the result of the first subtraction, and is the right answer). The second CSwap does nothing.
3. if N in 2P .. 2**256-1, then neither subtraction underflows. Both calls to CSwap do nothing. The correct answer is the result of the second subtraction, which is already in R2. Phew!

The function To_Bytes_32 is just a simple type conversion from Normal_GF to Bytes_32, which I won't reproduce here - it's a simple translation of the third and final FOR loop in the C code.

But... there's a problem. Running GNATprove on this yields:

sparknacl-utils.adb:197:27: medium: predicate check might fail


Where line 197 is the return statement with the call to To_Bytes_32.

The problem is that R2 is a Temp_GF (which might have a negative value in the most significant limb), while the formal parameter of To_Bytes_32 is required to be a Normal_GF. Aha! We need to (somehow) keep track of the fact that the value in R2 is always a Normal_GF after the calls to Subtract_P and CSwap.

It suffices to establish that relationship with a post-condition on Subtract_P. In short, we need to know that a subtraction underflows if and only if the result of the Subtraction is not a Normal_GF. This means we extend the specification of Subtract_P as follows:

--  Result := T - P;
--  if     Underflow, then Result is not a Normal_GF
--  if not Underflow, then Result is     a Normal_GF
procedure Subtract_P (T         : in     Temp_GF;
Result    :    out Temp_GF;
Underflow :    out Boolean)
with Global => null,
Pre    => T (15) >= -16#8000#,
Post   => (Result (15) >= T (15) - 16#8000#) and then
(Underflow /= (Result in Normal_GF));


With that in place, GNATprove discharges all the VCs for Pack_25519.

## Aside 1 - what about those calls to car25519? #

In the TweetNaCl sources, pack25519 can deal with gf values that have limbs outside of the range 0 .. 65535. This means that the formal parameter n has be "normalised" before it can be packed. This involves a ripple-carry algorithm that normalises each limb mod 65536 and "carries" any extra bits into the next limb. That's what the car25519 function does, and it turns out you need to apply it three times to normalise any gf value to get a normal gf. In SPARKNaCl, the carrying function is applied earlier and more aggressively, so these calls don't appear in Pack_25519 (note that the formal parameter is already a Normal_GF, not "any old GF").

## Aside 2 - the 2014 bug #

In 2014, a bug was reported in TweetNaCl's implementation of the pack25519 function. If we try to reproduce that bug in SPARKNaCl, what happens?

The bug concerns the normalisation of limb 14 in Subtract_P. The correct code (line 133 in sparknacl-utils.adb) looks like this:

R (14) := R (14) mod LM;


The buggy (pre-2014) version normalised limb 15 instead. To re-introduce this bug into SPARKNaCl, we would change it to:

R (15) := R (15) mod LM;

Running GNATprove on that buggy version yields:

sparknacl-utils.adb:139:23: medium: predicate check might fail


Line 139 is the final assignment from R to Result. The "predicate check" here is trying to prove that R is a Temp_GF value, but this proof fails since limb 14 of R has not been reduced to be in 0 .. 65535.

It is pleasing to see how the use of simple subtypes and type-safety proof can still spot such a subtle functional bug in the original version of TweetNaCl.

## Conclusions and Further Work #

This blog entry has been rather long, and only shown how a single procedure from the C version of TweetNaCl has been translated and proved with SPARK. While some of the functions in TweetNaCl requires some real work to prove, most translated into SPARK and proved with very little effort. Some observation arise from this work:

1. SPARK's "Dynamic_Predicate" aspect proved to be invaluable. I think this is possibly an unsung "killer feature" of the latest version of Ada - it provides a massive change in the expressive power of the type system, and thus the capability of proof tools like GNATprove.

2. It is possible to write constant time and performance-critical code like this in SPARK.

3. Some Ada idioms are extremely useful for crypto programming - particularly the intrinsic Shift and Rotate operations for modular types, pragma Inspection_Point (for sanitising data), limited types (to ensure pass-by-reference and no copies of sensitive data are ever made), first-class composite types, and so on.

4. Once again, we seen that "type safety" proof (aka "Silver level" in SPARK) gets a lot of bang-for-the-buck in terms of assurance and also spotting functional bugs en-passant.

Under "Further Work", I plan to do some performance testing soon - pitting SPARKNaCl against TweetNaCl on a small RISC-V bare-metal target. I expect SPARKNaCl will be a bit slower to start with, but (having proved type safety), I expect the compiler's optimisation to perform well, so we'll see. Perhaps that's another blog entry to come...