AdaCore Blog

Proving properties of constant-time crypto code in SPARKNaCl

Proving properties of constant-time crypto code in SPARKNaCl

by Roderick Chapman (Protean Code Limited) Guest Author

Intro­duc­tion #

Over the last few months, I devel­oped a SPARK ver­sion of the Tweet­Na­Cl cryp­to­graph­ic library. This was made pub­lic on GitHub here in Feb­ru­ary 2020, under the 2‑clause BSD licence. Tweet­Na­Cl has achieved wide­spread pop­u­lar­i­ty and adop­tion and is used to under­pin many oth­er projects. The JavaScript ver­sion of the library is used by many web sites, with the NPM pack­age man­ag­er report­ing over 14 Mil­lion down­loads per week.

The README file at the top-lev­el in that repos­i­to­ry con­tains most of the back­ground infor­ma­tion that you’ll need to build and use SPARK­Na­Cl. The code has been proven type safe” (aka no run-time errors” or the Sil­ver” lev­el of SPARK adop­tion) with the Com­mu­ni­ty 2019 edi­tion of SPARK.

This blog entry goes into a bit more tech­ni­cal detail on one par­tic­u­lar aspect of the project: the chal­lenge of re-writ­ing and ver­i­fy­ing con­stant time” algo­rithms using SPARK. This is some­thing of a big deal in cryp­to code. Some algo­rithms are known to leak” infor­ma­tion because their run­ning time varies with respect to sen­si­tive vari­ables, like cryp­to­graph­ic keys, so a Con­stant Time” pro­gram­ming style is adopt­ed to pre­vent this kind of side-chan­nel attack.

Tweet­Na­Cl is some­thing of a mas­ter­class in com­pact pro­gram­ming — it meets the unusu­al goal that the entire source code of the library can be trans­mit­ted in under 100 tweets. Nat­u­ral­ly, this leads to a some­what eco­nom­i­cal cod­ing style, and a com­plete lack of com­ments in the code. A paper on the authors’ web­site explains some of it, but leaves a lot of detail either unstat­ed or assumed.

SPARK­Na­Cl aims to reverse this sit­u­a­tion, prov­ing an alter­na­tive ref­er­ence imple­men­ta­tion of the NaCl API that is well-doc­u­ment­ed, prov­ably free of run­time errors, and pre­serves the con­stant time” nature of all the algorithms.

This blog entry con­cen­trates on just one of the inter­nal func­tions of Tweet­Na­Cl — a func­tion called Pack_​25519”.

Let’s start with a look at the orig­i­nal C imple­men­ta­tion, start­ing with a lit­tle back­ground infor­ma­tion on the types and data that we’re work­ing with.

Tweet­Na­Cl defines a few types and macros that we need to know about — namely:

#define FOR(i,n) for (i = 0; i < n; ++i)
#define sv static void
typedef unsigned char u8;
typedef long long i64;
typedef i64 gf[16];

In par­tic­u­lar, the type gf” is an array of 16 64-bit inte­gers. This is actu­al­ly used to rep­re­sent a 256-bit big inte­ger” where each dig­it or limb” is 16 bits, so in the range 0 .. 65535. (64 bits are used so these dig­its can over­flow” dur­ing inter­me­di­ate com­pu­ta­tions…) A gf val­ue where all the limbs are real­ly in the range 0 .. 65535 we’ll call a Nor­mal GF”.

You also need to know that NaCl imple­ments an Ellip­tic Curve called Curve25519” where the 25519” is a ref­er­ence to the mod­u­lus of the curve coor­di­nates, which is actu­al­ly 2**255 — 19 which (from now on) we will call P. The gf type rep­re­sents such a val­ue in lit­tle-endi­an for­mat, so the limb at array index 0 rep­re­sents the least-sig­nif­i­cant digit.

In the C code, there’s a func­tion called pack25519” with declaration:

sv pack25519(u8 *o, const gf n);

A bit of read­ing and dig­ging reveals that the para­me­ter o is sup­posed to be point­ing at an unini­tial­ized array of 32 bytes, which is assigned in the func­tion based on the val­ue of the para­me­ter n.

The Tweet­Na­Cl paper says that the pur­pose of Pack25519 is to freeze inte­ger mod 2**25519 and store”.

This gives a bit of a hint. Note that a Nor­mal GF val­ue stores six­teen 16-bit val­ues, which is 256 bits (so rep­re­sent­ing inte­gers in the range 0 .. 2**256 — 1) but that P is slight­ly less than half that upper bound. The point of Pack25519, then, is to take a Nor­mal GF val­ue (which might rep­re­sent an inte­ger greater than or equal to P) and reduce it mod­u­lo P, then pack” the result­ing val­ue into 32 8‑bit bytes.

This gives us enough to write a first spec­i­fi­ca­tion for Pack_​25519 in SPARK. In SPARK, it is per­fect­ly nor­mal for func­tions to return a com­pos­ite type (such as an array) so it seems nat­ur­al to use a func­tion, not a pro­ce­dure. We’ll also need to intro­duce a few types as fol­lows. For a longer expla­na­tion of the con­stants and how they are derived, see the SPARK­Na­Cl sources in sparknacl.ads. For the com­plete source code for Pack_​25519, see file sparknacl-utils.adb.

subtype I32 is Integer_32;
subtype N32 is I32 range 0 .. I32'Last;
subtype I64 is Integer_64;

subtype Index_32 is I32 range 0 .. 31;

type Byte_Seq is array (N32 range <>) of Byte;
subtype Bytes_32 is Byte_Seq (Index_32);

--  "LM"   = "Limb Modulus"
--  "LMM1" = "Limb Modulus Minus 1"
LM   : constant := 65536;
LMM1 : constant := 65535;
--  "R2256" = "Remainder of 2**256 (modulo 2**255-19)"
R2256 : constant := 38;

--  "Maximum GF Limb Coefficient"
MGFLC : constant := (R2256 * 15) + 1;

--  "Maximum GF Limb Product"
MGFLP : constant := LMM1 * LMM1;

subtype GF_Any_Limb is I64 range -LM .. (MGFLC * MGFLP);

type GF is array (Index_16) of GF_Any_Limb;

subtype GF_Normal_Limb is I64 range 0 .. LMM1;

subtype Normal_GF is GF
  with Dynamic_Predicate =>
     (for all I in Index_16 => Normal_GF (I) in GF_Normal_Limb);

Note that Normal_​GF is a sub­type of GF and uses Ada’s Dynamic_​Predicate” aspect to con­strain the val­ue of each limb.

Final­ly, we can intro­duce a SPARK spec­i­fi­ca­tion for Pack_25519:

--  Reduces N modulo (2**255 - 19) then packs the
--  value into 32 bytes little-endian.
function Pack_25519 (N : in Normal_GF) return Bytes_32
  with Global => null;

Mod­u­lar reduc­tion in vari­able time #

A vari­able time” algo­rithm for mod­u­lar reduc­tion is easy — you just sub­tract P from the val­ue of N until the result lies in 0 .. P‑1 and you’re done. But we’re not allowed to do that. Try again!

A use­ful obser­va­tion is that the val­ue N can only lie in one of three inter­est­ing ranges. It might be in 0 .. P‑1 already, in which case there’s noth­ing to be done. It might lie in P .. 2P1, in which case we can just sub­tract P once. Final­ly, if it lies in 2P .. 2**256 – 1, then we sub­tract P twice and return the result. Unfor­tu­nate­ly, that isn’t con­stant time either, so no good.

Now in con­stant time… #

Let’s have a look at how the Tweet­Na­Cl C code does it:

sv pack25519 (u8 *o, const gf n)
{
  int i, j, b;
  gf m, t;
  FOR(i,16) t[i]=n[i];
  car25519(t);
  car25519(t);
  car25519(t);
  FOR(j, 2) {
    m[0]=t[0]-0xffed;
    for(i=1;i<15;i++) {
      m[i]=t[i]-0xffff-((m[i-1]>>16)&1);
      m[i-1]&=0xffff;
    }
    m[15]=t[15]-0x7fff-((m[14]>>16)&1);
    b=(m[15]>>16)&1;
    m[14]&=0xffff;
    sel25519 (t, m, 1-b);
  }
  FOR(i, 16) {
    o[2*i]=t[i]&0xff;
    o[2*i+1]=t[i]>>8;
  }
}

Eh? What on earth is going on here? Good ques­tion! Let’s break it down a bit and look at each of the three out­er FOR loops:

  1. The first FOR is just an array assign­ment from n to t. No prob­lem in SPARK.
  2. The third FOR is tak­ing six­teen 16-bit val­ues in t and assign­ing them onto 32 8‑bit val­ues in o in lit­tle-endi­an” for­mat, so that’s just a type-con­ver­sion function.
  3. The mid­dle FOR is inter­est­ing! It loops exact­ly twice. The body of the loop first sub­tracts 0xffed from limb 0 of t, then sub­tracts 0xffff from limbs 1 through 14 (with car­ry and reduc­ing each limb mod 65536), then final­ly sub­tracts 0x7fff from limb 15. So…this is sub­tract­ing the lit­tle-endi­an rep­re­sen­ta­tion of P from t.

Don’t wor­ry about those three calls to car25519” — we’ll come back to those.

Diver­sion — Con­stant Time Con­di­tion­al Swap­ping #

What about that call to sel25519” at the end of the loop body. What does that do? The Tweet­Na­Cl paper says this is a con­stant time con­di­tion­al swap” for GF val­ues. The C func­tion pro­to­type looks like this:

sv sel25519 (gf p, gf q, int b);

which doesn’t help much. Note that gf” is an array type, so para­me­ter p and q are by-ref­er­ence (so prob­a­bly in out” in Ada terms). The para­me­ter b is a int” here but is only ever called with val­ues 0 or 1, so prob­a­bly being inter­pret­ed as a Boolean val­ue. This gives is a rea­son­able chance of writ­ing a SPARK spec­i­fi­ca­tion for it. I have changed the name to CSwap” which seems more readable:

--  Constant time conditional swap of P and Q.
procedure CSwap (P    : in out GF;
                 Q    : in out GF;
                 Swap : in     Boolean)
  with Global => null,
       Contract_Cases =>
         (Swap     => (P = Q'Old and Q = P'Old)
          not Swap => (P = P'Old and Q = Q'Old));

The Contract_​Cases” aspect here says exact­ly what the pro­ce­dure does — swap­ping P and Q (or not) depend­ing on the val­ue of Swap. It might be tempt­ing to write the body trivially:

if Swap then
   Temp := P;
   P := Q;
   Q := Temp;
end if;

but that won’t do, since we require the run­ning time to be con­stant, not even depend­ing on the val­ue of Swap. Let’s see how the C code works:

sv sel25519 (gf p, gf q, int b)
{
  i64 t, i, c = ~(b-1);
  FOR(i, 16) {
    t= c&(p[i]^q[i]);
    p[i]^=t;
    q[i]^=t;
  }
}

Once again, this is less than obvi­ous, but becomes clear when you con­sid­er the prop­er­ties of the bit­wise xor” oper­a­tor (^ in C), and the way the local vari­able c is initialised.

c is ini­tialised to be “~(b‑1)” where b can be only 0 or 1. If b is 0, then b‑1 is ‑1. Bit­wise not” of ‑1 yields 0. Sim­i­lar­ly, if b is 1, then c is ini­tialised to ‑1 or (more use­ful­ly all 1s” in binary).

This is best han­dled in SPARK with a con­stant look-up table:

type Bit_To_Swapmask_Table is array (Boolean) of U64;
Bit_To_Swapmask : constant Bit_To_Swapmask_Table :=
  (False => 16#0000_0000_0000_0000#,
   True  => 16#FFFF_FFFF_FFFF_FFFF#);

Trans­lat­ing and prov­ing the body in SPARK reveals a slight­ly ugly side to the C ver­sion — it uses bit­wise oper­a­tors (like ~ and ^) on signed inte­gers. This is imple­men­ta­tion-defined in C since the seman­tics fall back on what­ev­er rep­re­sen­ta­tion of the inte­gers hap­pens to be in use. In SPARK, there’s no equiv­a­lent, so we’re forced to use instan­ti­a­tions of Unchecked_​Conversion to switch from I64 to U64 and back again. We also need to intro­duce an axiom for proof:

pragma Assume
  (for all K in I64 => To_I64 (To_U64 (K)) = K);

Writ­ing the body of CSwap in SPARK is then rea­son­ably easy. For proof, we need a loop invari­ant — this states that after I loop iter­a­tions, the first I ele­ments of P and Q have been swapped (or not…) It comes out like this:

procedure CSwap (P    : in out GF;
                 Q    : in out GF;
                 Swap : in     Boolean)
is
   T : U64;
   C : U64 := Bit_To_Swapmask (Swap);
begin
   for I in Index_16 loop
      T := C and (To_U64 (P (I)) xor To_U64 (Q (I)));
      P (I) := To_I64 (To_U64 (P (I)) xor T);
      Q (I) := To_I64 (To_U64 (Q (I)) xor T);

      pragma Loop_Invariant
        (if Swap then
           (for all J in Index_16 range 0 .. I =>
                (P (J) = Q'Loop_Entry (J) and
                 Q (J) = P'Loop_Entry (J)))
         else
           (for all J in Index_16 range 0 .. I =>
                (P (J) = P'Loop_Entry (J) and
                 Q (J) = Q'Loop_Entry (J)))
        );
   end loop;
end CSwap;

I hope you can see how it works now. Remem­ber that X xor 0 = X and that X xor X = 0.

Pleas­ing­ly, GNAT­prove dis­charges all the VCs for this sub­pro­gram with no fur­ther assis­tance. The full ver­sion of CSwap in SPARK­Na­Cl also san­i­tizes the local vari­ables T and C (i.e. eras­es their val­ues from mem­o­ry) — anoth­er strange and tricky idiom to mas­ter in cryp­to pro­gram­ming. This explains why C is declared as a vari­able and not a con­stant here.

Sub­tract­ing P safe­ly #

Hav­ing estab­lished what sel25519 (and its SPARK equiv­a­lent CSwap) is all about, we can now return to the Pack_​25519 func­tion. The key real­i­sa­tion is that the out­er (sec­ond) loop always sub­tracts P from t twice, and then uses sel25519 to swap” the right answer into t so it can be assigned to o.

Anoth­er key dis­cov­ery is that the deci­sion to swap (or not) is based on whether each sub­trac­tion did or didn’t under­flow (i.e. return a neg­a­tive result). Imag­ine the case where N in is 0 .. P‑1. If you sub­tract P from that then the result will be neg­a­tive, telling you that you can dis­card the result of the sub­trac­tion and the answer you real­ly want is N.

This means we need a sub­tract P” pro­ce­dure that returns both the result of the sub­trac­tion and a Boolean Under­flow” flag. The result might be neg­a­tive so we use a GF sub­type with a Dynamic_​Predicate again, plus a pre- and post-con­di­tion to make sure we can apply Subtract_​P more than once:

--  Subtracting P twice from a Normal_GF might result
--  in a GF where limb 15 can be negative with lower bound -65536
subtype Temp_GF_MSL is I64 range -LM .. LMM1;
subtype Temp_GF is GF
  with Dynamic_Predicate =>
    (Temp_GF (15) in Temp_GF_MSL and
      (for all K in Index_16 range 0 .. 14 =>
         Temp_GF (K) in GF_Normal_Limb));

procedure Subtract_P (T         : in     Temp_GF;
                      Result    :    out Temp_GF;
                      Underflow :    out Boolean)
  with Global => null,
       Pre    => T (15) >= -16#8000#,
       Post   => (Result (15) >= T (15) - 16#8000#);

The body of Subtract_​P is basi­cal­ly a sim­ple trans­la­tion of the C code, but with a suit­able loop invari­ant, which is suf­fi­cient to estab­lish the absence of run-time errors. We also need one more sub­type, thus:

subtype I64_Bit is I64 range 0 .. 1;

procedure Subtract_P (T         : in     Temp_GF;
                      Result    :    out Temp_GF;
                      Underflow :    out Boolean)
is
   Carry : I64_Bit;
   R     : GF;
begin
   R := (others => 0);

   --  Limb 0 - subtract LSL of P, which is 16#FFED#
   R (0) := T (0) - 16#FFED#;

   --  Limbs 1 .. 14 - subtract FFFF with carry
   for I in Index_16 range 1 .. 14 loop
      Carry     := ASR_16 (R (I - 1)) mod 2;
      R (I)     := T (I) - 16#FFFF# - Carry;
      R (I - 1) := R (I - 1) mod LM;

      pragma Loop_Invariant
        (for all J in Index_16 range 0 .. I - 1 =>
           R (J) in GF_Normal_Limb);
      pragma Loop_Invariant (T in Temp_GF);
   end loop;

   --  Limb 15 - Subtract MSL (Most Significant Limb)
   --  of P (16#7FFF#) with carry.
   --  Note that Limb 15 might become negative on underflow
   Carry  := ASR_16 (R (14)) mod 2;
   R (15) := (T (15) - 16#7FFF#) - Carry;
   R (14) := R (14) mod LM;

   --  Note that R (15) is not normalized here, so that the
   --  result of the first subtraction is numerically correct
   --  as the input to the second.
   Underflow := R (15) < 0;
   Result    := R;
end Subtract_P;

Back to Pack_​25519 #

Final­ly, we’re in a posi­tion to write the body of Pack_​25519. It’s rather sim­ple now:

function Pack_25519 (N : in Normal_GF) return Bytes_32
is  
   L      : GF;
   R1, R2 : Temp_GF;
   First_Underflow  : Boolean;
   Second_Underflow : Boolean;
begin
   L := N;
   Subtract_P (L,  R1, First_Underflow);
   Subtract_P (R1, R2, Second_Underflow);
   CSwap (R1, R2, Second_Underflow);
   CSwap (L,  R2, First_Underflow);
   return To_Bytes_32 (R2);
end Pack_25519;

The trick is to arrange the calls to CSwap to always put the cor­rect answer into R2. Con­sid­er the three cas­es we men­tioned above.

  1. if N in 0 .. P‑1, then the first sub­trac­tion will under­flow, and so will the sec­ond. In that case, the sec­ond call to CSwap puts L into R2, which is the right answer — the orig­i­nal val­ue of N.
  2. if N in P .. 2P1, then the first sub­trac­tion will not under­flow, but the sec­ond will. The first call to CSwap puts R1 into R2 (which is the result of the first sub­trac­tion, and is the right answer). The sec­ond CSwap does nothing.
  3. if N in 2P .. 2**256 – 1, then nei­ther sub­trac­tion under­flows. Both calls to CSwap do noth­ing. The cor­rect answer is the result of the sec­ond sub­trac­tion, which is already in R2. Phew!

The func­tion To_​Bytes_​32 is just a sim­ple type con­ver­sion from Normal_​GF to Bytes_​32, which I won’t repro­duce here — it’s a sim­ple trans­la­tion of the third and final FOR loop in the C code.

But… there’s a prob­lem. Run­ning GNAT­prove on this yields:

sparknacl-utils.adb:197:27: medium: predicate check might fail

Where line 197 is the return state­ment with the call to To_Bytes_32.

The prob­lem is that R2 is a Temp_​GF (which might have a neg­a­tive val­ue in the most sig­nif­i­cant limb), while the for­mal para­me­ter of To_​Bytes_​32 is required to be a Normal_​GF. Aha! We need to (some­how) keep track of the fact that the val­ue in R2 is always a Normal_​GF after the calls to Subtract_​P and CSwap.

It suf­fices to estab­lish that rela­tion­ship with a post-con­di­tion on Subtract_​P. In short, we need to know that a sub­trac­tion under­flows if and only if the result of the Sub­trac­tion is not a Normal_​GF. This means we extend the spec­i­fi­ca­tion of Subtract_​P as follows:

--  Result := T - P;
--  if     Underflow, then Result is not a Normal_GF
--  if not Underflow, then Result is     a Normal_GF
procedure Subtract_P (T         : in     Temp_GF;
                      Result    :    out Temp_GF;
                      Underflow :    out Boolean)
        with Global => null,
             Pre    => T (15) >= -16#8000#,
             Post   => (Result (15) >= T (15) - 16#8000#) and then
                       (Underflow /= (Result in Normal_GF));

With that in place, GNAT­prove dis­charges all the VCs for Pack_25519.

Aside 1 — what about those calls to car25519? #

In the Tweet­Na­Cl sources, pack25519 can deal with gf val­ues that have limbs out­side of the range 0 .. 65535. This means that the for­mal para­me­ter n has be nor­malised” before it can be packed. This involves a rip­ple-car­ry algo­rithm that nor­malis­es each limb mod 65536 and car­ries” any extra bits into the next limb. That’s what the car25519 func­tion does, and it turns out you need to apply it three times to nor­malise any gf val­ue to get a nor­mal gf. In SPARK­Na­Cl, the car­ry­ing func­tion is applied ear­li­er and more aggres­sive­ly, so these calls don’t appear in Pack_​25519 (note that the for­mal para­me­ter is already a Normal_​GF, not any old GF”).

Aside 2 — the 2014 bug #

In 2014, a bug was report­ed in TweetNaCl’s imple­men­ta­tion of the pack25519 func­tion. If we try to repro­duce that bug in SPARK­Na­Cl, what happens?

The bug con­cerns the nor­mal­i­sa­tion of limb 14 in Subtract_​P. The cor­rect code (line 133 in sparknacl-utils.adb) looks like this:

R (14) := R (14) mod LM;

The bug­gy (pre-2014) ver­sion nor­malised limb 15 instead. To re-intro­duce this bug into SPARK­Na­Cl, we would change it to:

R (15) := R (15) mod LM;

Run­ning GNAT­prove on that bug­gy ver­sion yields:

sparknacl-utils.adb:139:23: medium: predicate check might fail

Line 139 is the final assign­ment from R to Result. The pred­i­cate check” here is try­ing to prove that R is a Temp_​GF val­ue, but this proof fails since limb 14 of R has not been reduced to be in 0 .. 65535.

It is pleas­ing to see how the use of sim­ple sub­types and type-safe­ty proof can still spot such a sub­tle func­tion­al bug in the orig­i­nal ver­sion of TweetNaCl.

Con­clu­sions and Fur­ther Work #

This blog entry has been rather long, and only shown how a sin­gle pro­ce­dure from the C ver­sion of Tweet­Na­Cl has been trans­lat­ed and proved with SPARK. While some of the func­tions in Tweet­Na­Cl requires some real work to prove, most trans­lat­ed into SPARK and proved with very lit­tle effort. Some obser­va­tion arise from this work:

  1. SPARK’s Dynamic_​Predicate” aspect proved to be invalu­able. I think this is pos­si­bly an unsung killer fea­ture” of the lat­est ver­sion of Ada — it pro­vides a mas­sive change in the expres­sive pow­er of the type sys­tem, and thus the capa­bil­i­ty of proof tools like GNATprove.

  2. It is pos­si­ble to write con­stant time and per­for­mance-crit­i­cal code like this in SPARK.

  3. Some Ada idioms are extreme­ly use­ful for cryp­to pro­gram­ming — par­tic­u­lar­ly the intrin­sic Shift and Rotate oper­a­tions for mod­u­lar types, prag­ma Inspection_​Point (for sani­tis­ing data), lim­it­ed types (to ensure pass-by-ref­er­ence and no copies of sen­si­tive data are ever made), first-class com­pos­ite types, and so on.

  4. Once again, we seen that type safe­ty” proof (aka Sil­ver lev­el” in SPARK) gets a lot of bang-for-the-buck in terms of assur­ance and also spot­ting func­tion­al bugs en-passant.

Under Fur­ther Work”, I plan to do some per­for­mance test­ing soon — pit­ting SPARK­Na­Cl against Tweet­Na­Cl on a small RISC‑V bare-met­al tar­get. I expect SPARK­Na­Cl will be a bit slow­er to start with, but (hav­ing proved type safe­ty), I expect the compiler’s opti­mi­sa­tion to per­form well, so we’ll see. Per­haps that’s anoth­er blog entry to come…

Posted in #SPARK    #Cryptography    #Formal Verification   

About Roderick Chapman

Roderick Chapman

Rod is Director of Protean Code Limited. He specializes in the design, implementation and verification of high-integrity software systems. For many years, Rod led the SPARK design team at Praxis and Altran in the UK, and continues to work closely with the SPARK team at AdaCore.