AdaCore Blog

CuBit: A General-Purpose Operating System in SPARK/Ada

CuBit: A General-Purpose Operating System in SPARK/Ada

by Jon Andrew (B.E.A.T. LLC.) Guest Author

CuBit: A Gen­er­al-Pur­pose Oper­at­ing Sys­tem in SPARK/​Ada #

Intro­duc­tion #

Last year, I start­ed eval­u­at­ing pro­gram­ming lan­guages for a for­mal­ly-ver­i­fied oper­at­ing sys­tem. I’ve been devel­op­ing soft­ware for a while, but only recent­ly began work in high integri­ty soft­ware devel­op­ment and for­mal meth­ods. There are sev­er­al oper­at­ing sys­tem projects, like the SeL4 micro­ker­nel and the Muen sep­a­ra­tion ker­nel, that make use of for­mal ver­i­fi­ca­tion. But I was inter­est­ed in using a for­mal­ly-ver­i­fied lan­guage to write a gen­er­al-pur­pose OS — an envi­ron­ment for abstract­ing the under­ly­ing hard­ware while act­ing as an arbiter for run­ning the nor­mal appli­ca­tions we’re used to.

Many of the lan­guages used to write for­mal­ly-ver­i­fied soft­ware rely on a run­time envi­ron­ment or make heavy use of garbage-col­lec­tion, which makes them dif­fi­cult to use for oper­at­ing sys­tem devel­op­ment. SPARK, with its sup­port for zero-foot­print run­times and embed­ded devices, seemed to fit the bill. Ada’s rich type sys­tem has a lot to offer the oper­at­ing sys­tem devel­op­er, and there is good sup­port for writ­ing bare-met­al code.

Overview #

CuBit is still in a very ear­ly stage of devel­op­ment. It boots into 64-bit mode on the x86-64, runs in the high­er-half of mem­o­ry, and is mul­ti­core (up to 128 log­i­cal proces­sors) with pre­emp­tive mul­ti­task­ing. There are vir­tu­al mem­o­ry object and phys­i­cal mem­o­ry allo­ca­tors, and CuBit can start prim­i­tive user-mode process­es with sup­port for the fast x86-64 SYSCALL instruc­tion for get­ting back into ker­nel mode. Filesys­tem dri­vers and VFS aren’t ready yet, but progress is being made in this area. My intent with CuBit is not to cre­ate a for­mal­ly-ver­i­fied Unix clone. I’ve tak­en inspi­ra­tion from a num­ber of dif­fer­ent oper­at­ing sys­tems, from embed­ded OSes like Xinu and QNX to BSD, Lin­ux and even Win­dows and VMS.

Hav­ing said all that, CuBit doesn’t real­ly do all that much yet! But I’ve had a lot of fun and learned a lot devel­op­ing what is there so far. I’ve spent a lot of time on doc­u­men­ta­tion and try­ing to make the code easy for con­trib­u­tors — so let’s dive in! Code is avail­able at https://​github​.com/​d​o​c​a​n​d​r​e​w​/​CuBit.

First Steps #

Luke Guest’s Ada Bare Bones arti­cle on osdev​.org (https://​wiki​.osdev​.org/​A​d​a​_​B​a​r​e​_​bones) does a good job of illus­trat­ing some of the lim­i­ta­tions that must be con­sid­ered when devel­op­ing an oper­at­ing sys­tem in Ada, and was the start­ing point I used. One of the top­ics he devotes some time to is the use of pragma to dis­able Ada fea­tures that aren’t suit­able for OS code. I use the gnat.adc file to list them all. Many of the prag­mas I spec­i­fy in gnat.adc are self-explana­to­ry, but it’s worth talk­ing about a few.

In gnat.adc:

pragma Suppress (Index_Check);
pragma Suppress (Range_Check);
pragma Suppress (Overflow_Check);
...
pragma Restrictions (No_Floating_Point);

I sup­press the Index, Range and Over­flow checks because we should use gnatprove at com­pile-time to demon­strate the absence of these types of errors. The beau­ty of SPARK is that we can show the absence of these errors at com­pile-time rather than rely on run­time excep­tions. Now, just because we can prove absence of these errors doesn’t mean CuBit is there yet. There is still a lot of work to be done in the proof depart­ment. There’s a bal­ance between adding new fea­tures to the OS and going back and try­ing to ver­i­fy the work that was already done. Since this is a side-project for now, I find that usu­al­ly new fea­tures win my atten­tion. Hav­ing said that, I do enjoy going back, pick­ing a file, and try­ing to get it to pass all of gnatproves checks with no warn­ings or errors.

The next prag­ma dis­ables float­ing-point math. The x87 FPU has to be explic­it­ly ini­tial­ized pri­or to use. Depend­ing on where we do that in the OS (CuBit doesn’t do it at all, yet!), we might get our­selves into trou­ble by acci­den­tal­ly per­form­ing a float­ing-point oper­a­tion which would cause a hard­ware excep­tion. This prag­ma pre­vents us from doing so.

Gen­er­al­ly-speak­ing, use of float­ing-point in the OS is to be avoid­ed any­how, as it means that dur­ing a con­text switch (enter­ing the OS from user code or vice-ver­sa), we have to save the float­ing-point state of the user process and restore the kernel’s float­ing-point state. If the user process doesn’t make use of the FPU, then we can gain some effi­cien­cy by avoid­ing it in the ker­nel as well. I’m hop­ing CuBit can avoid float­ing-point math alto­geth­er in the ker­nel, but it remains to be seen whether that’s pos­si­ble long-term.

Since CuBit runs in 64-bit mode, we also have to con­sid­er SSE instruc­tions. These are a lit­tle bit sneaki­er. Unlike float­ing-point oper­a­tions, which are very easy to avoid using Ada’s mod­u­lar types, the GCC code gen­er­a­tor can put SSE instruc­tions in where you don’t expect them for bet­ter per­for­mance. Since CuBit doesn’t yet sup­port SSE, we need to explic­it­ly dis­able it with com­pil­er flags in the .gpr project file:

In cubit.gpr:

package Compiler is
    for Default_Switches ("Ada") use
    (
        ...
        "-mno-sse"
        "-mno-sse2"
        ...
    );
end Compiler;

That one was learned the hard way, when I got an unex­pect­ed hard­ware excep­tion and had to dig through the gen­er­at­ed assem­bly to see what the prob­lem was. Speak­ing of assembly…

Some Assem­bly Required #

In order to get to our SPARK code, a lit­tle bit of assem­bly code is nec­es­sary. I pre­fer Intel syn­tax vs the AT&T syn­tax com­mon­ly used in GCC, so use the yasm assem­bler in CuBit. Inline assem­bly in GNAT is still using the native AT&T‑style syn­tax, so it can be seen in CuBit too, when we need to use native instruc­tions from Ada for spe­cif­ic hard­ware func­tion­al­i­ty, like read­ing I/O ports, or an x86 mod­el-spe­cif­ic reg­is­ter, shown here.

In x86.adb:

---------------------------------------------------------------------------
-- Read from a model-specific register (MSR)
---------------------------------------------------------------------------
function rdmsr(msraddr : in MSR) return Unsigned_64
is
    low : Unsigned_32;
    high : Unsigned_32;
begin
    Asm("rdmsr",
        Inputs  =>  Unsigned_32'Asm_Input("c", msraddr),
        Outputs => (Unsigned_32'Asm_Output("=a", low),
                    Unsigned_32'Asm_Output("=d", high)),
        Volatile => True);
    return (Shift_Left(Unsigned_64(high), 32) or Unsigned_64(low));
end rdmsr;

We aren’t allowed to use inline-assem­bly in SPARK-Mode, so we’ve marked the whole x86 pack­age body with SPARK_Mode => Off. How­ev­er, we can still make use of SPARK to ver­i­fy pre-con­di­tions and post-con­di­tions for these func­tions in the pack­age spec­i­fi­ca­tion. This is a tech­nique that CuBit uses quite often, as we often have to do point­er arith­metic, unchecked con­ver­sions and what John Barnes calls naughty peek­ing and pok­ing” with­in a sub­pro­gram body.

Link­ing #

One inter­est­ing thing about CuBit and many oper­at­ing sys­tems is that the ker­nel is just an ELF object file, with the typ­i­cal sec­tions we’d expect in a nor­mal appli­ca­tion — like .text, .rodata, .data, .bss, etc. We need to be a lit­tle more care­ful about how these sec­tions are linked (i.e. what address to use when refer­ring to a sym­bol). CuBit uses a link­er script to spec­i­fy exact­ly where the sec­tions should be linked, and then also pro­vide a load address using the AT key­word to tell the boot­loader where in phys­i­cal mem­o­ry to load it.

Since CuBit is 64-bit, we link the ker­nel in the upper 2GB of mem­o­ry, and use the -mcmodel=kernel com­pil­er flag to make sure the instruc­tions gen­er­at­ed by GNAT are using the appro­pri­ate address­ing mode. With a mul­ti­core archi­tec­ture, we con­sid­er the boot­strap” core or BSP and then appli­ca­tion proces­sors” or APs.

The APs will boot into real mode once we send them the appro­pri­ate inter­rupt, so the code there must be both phys­i­cal­ly in the first 64k of mem­o­ry, as well as linked there. After some ini­tial set­up to get to pro­tect­ed mode, they boot in a sim­i­lar man­ner to the BSP. We rename this sec­tion to .text_ap to dis­tin­guish it from the nor­mal ker­nel .text section.

In linker.ld:

/* AP starting point */
AP_START        = 0x7000;

/* kernel load and link locations */
KERNEL_PHYS     = 0x00100000;
KERNEL_BASE     = 0xFFFFFFFF80000000;

...
SECTIONS
{
    . = AP_START;

    .text_ap : AT(AP_START) {
        stext_ap = .;
        *(.text_ap_entry)
        etext_ap = .;
    }

	. = KERNEL_PHYS + KERNEL_BASE;

    KERNEL_START_VIRT = .;

    .text : AT(KERNEL_PHYS)
    {
        stext = .;
        build/boot.o (.text .text.*)    /* need this at the front */
        *( EXCLUDE_FILE(build/init.o) .text .text.*)
    }
    
    . = ALIGN(4K);
    etext = .;

    .rodata :
    {
        srodata = .;
        *(.rodata .rodata.*)
    }
...

If you’ve nev­er worked with link­er scripts, this might look a lit­tle fun­ny. Just remem­ber that the . means the cur­rent address”. So start­ing at:

. = AP_START

we could say, the cur­rent address is AP_START (or 0x7000).” Then, from that address, we define a section

.text_ap : AT(AP_START) {
    stext_ap = .;
    *(.text_ap_entry)
    etext_ap = .;
}

This means link the text_ap sec­tion start­ing from what­ev­er the cur­rent address is, and then AT(AP_START) puts a load address of AP_START in the object file, which our boot­loader will duti­ful­ly abide by. Note that there are what appear to be a cou­ple of assign­ments: stext_ap = .; and etext_ap = .;. These sym­bols will get export­ed for use in our Ada code. How­ev­er, we must always remem­ber to take the address of a sym­bol, nev­er refer to it direct­ly. To ensure this, CuBit uses the Ada type sys­tem to help out:

In util.ads:

---------------------------------------------------------------------------
    -- Symbol is a useless type, used to prevent us from forgetting to use
    -- 'Address when referring to one.
    ---------------------------------------------------------------------------
    type Symbol is (USELESS) with Size => System.Word_Size;

In-between the stext_ap = .; and etext_ap = .; sym­bol exports, we have the cream fill­ing of the .text_ap sec­tion. Here’s where we spec­i­fy what gets includ­ed in that sec­tion. In this case, it’s every­thing marked as a .text_ap_entry sec­tion with­in indi­vid­ual object files — which in this case is just the com­piled boot_ap.asm or AP entry code. We use the section direc­tive in our assem­bly to spec­i­fy what the object file sec­tion will be called:

In boot_ap.asm:

...

BITS 16

; we'll link this section down low, since it has to be in first
; 65535 bytes for real mode.
section .text_ap_entry
...

If we take a look at the object file with readelf we can see what’s going on:

> readelf -a build/boot_ap.o
...
Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .shstrtab         STRTAB           0000000000000000  00000320
       000000000000009e  0000000000000000           0     0     0
  [ 2] .strtab           STRTAB           0000000000000000  000003c0
       000000000000009d  0000000000000000           0     0     0
  [ 3] .symtab           SYMTAB           0000000000000000  00000460
       0000000000000198  0000000000000018           2    15     8
  [ 4] .text             PROGBITS         0000000000000000  00000040
       0000000000000000  0000000000000000  AX       0     0     16
  [ 5] .text_ap_entry    PROGBITS         0000000000000000  00000040
       000000000000008e  0000000000000000   A       0     0     16
...

The .text_ap_entry sec­tion shown here will get placed in the .text_ap sec­tion we defined in our link­er script. It seems yasm gives us a .text sec­tion we didn’t ask for, but it’s size 0, as expect­ed, since boot_ap.asm does not spec­i­fy a .text section.

We export some oth­er use­ful sym­bols in the link­er file too — like stext, etext, srodata, erodata, etc. This is just to mark the start and end of the var­i­ous sec­tions in our ker­nel image. We’ll return to those in just a bit.

The oth­er sec­tions, for read-only data (.rodata), ini­tial­ized sta­t­ic data (.data) and unini­tial­ized sta­t­ic data (.bss) fol­low after the .text sec­tion. Note we set the cur­rent address” using the ALIGN(4K) direc­tives. I like to have these page-aligned (4k when using small pages on x86-64) for a cou­ple of rea­sons. The first, less-impor­tant rea­son is that it makes debug­ging eas­i­er when I can quick­ly iden­ti­fy the range of mem­o­ry that might be caus­ing an error.

The sec­ond, more-impor­tant rea­son is that we can apply per-page access pro­tec­tions. For instance, mark­ing the no-exe­cute” (NX) bit for data sec­tions, and mark­ing .rodata as read-only. In doing so, if we make a mis­take and access some­thing we shouldn’t, we’ll hope­ful­ly get a (eas­i­ly debug­gable) hard­ware excep­tion rather than (hard to debug) unde­fined behav­ior. This is also a secu­ri­ty enhancement.

This is per­formed in the Mem_Mgr pack­age, when we map all phys­i­cal mem­o­ry in the sys­tem into the kernel’s page tables with the appro­pri­ate flags. Here, Ada’s type sys­tem helps us out again, since we can define ranges with the sec­tion start and end address­es we export­ed in the link­er script:

In mem_mgr.ads:

...
    subtype kernelTextPages is Virtmem.PFN range
        Virtmem.addrToPFN(Virtmem.K2P(To_Integer(Virtmem.stext'Address))) .. 
        Virtmem.addrToPFN(Virtmem.K2P(To_Integer(Virtmem.etext'Address) - 1));

    subtype kernelROPages is Virtmem.PFN range
        Virtmem.addrToPFN(Virtmem.K2P(To_Integer(Virtmem.srodata'Address))) .. 
        Virtmem.addrToPFN(Virtmem.K2P(To_Integer(Virtmem.erodata'Address) - 1));

    subtype kernelRWPages is Virtmem.PFN range
        Virtmem.addrToPFN(Virtmem.K2P(To_Integer(Virtmem.sdata'Address))) .. 
        Virtmem.addrToPFN(Virtmem.K2P(To_Integer(Virtmem.ebss'Address) - 1));
...

This looks a bit unwieldy, but we’re just defin­ing Ada ranges for each ELF sec­tion, con­vert­ing them to their phys­i­cal address, and then to a PFN or page frame num­ber” to make them eas­i­er to work with. Lat­er, we’ll iter­ate through all the mem­o­ry areas that the boot­loader iden­ti­fies, map­ping pages appro­pri­ate­ly. Ada makes it easy to check whether the page frame we’re look­ing at belongs to one of those ranges:

In mem_mgr.adb:

procedure determineFlagsAndMapFrame(frame : in Virtmem.PFN) is
...
    begin

        if frame in kernelTextPages then
            mapPage(fromPhys, 
                    toVirtLinear,
                    Virtmem.PG_KERNELDATARO,
                    kernelP4,
                    ok);

            if not ok then raise RemapException; end if;

            mapPage(fromPhys,
                    toVirtKernel,
                    Virtmem.PG_KERNELCODE,
                    kernelP4,
                    ok);
...

Here, if the PFN falls into the kernelTextPages range, then we’ll map it into vir­tu­al mem­o­ry twice. Once as read-only data (PG_KERNELDATARO) in what I call the lin­ear range”, at 0xFFFF_​8000_​0000_​0000 and up, where we map all phys­i­cal mem­o­ry in the sys­tem, and then again as in the upper 2GiB of mem­o­ry at 0xFFFF_​FFFF_​8000_​0000 and up, where it was linked and exe­cutes from. At boot, CuBit will dis­play the mem­o­ry address­es of the sec­tions as they are linked:

...
Kernel sections:
text:     0xFFFFFFFF80100000-0xFFFFFFFF80127000
rodata:   0xFFFFFFFF80127000-0xFFFFFFFF8012B000
data:     0xFFFFFFFF8012B000-0xFFFFFFFF8012C000
bss:      0xFFFFFFFF8012C000-0xFFFFFFFF804300A8
...

Mod­i­fy­ing the Run­time #

SPARK comes with a zero foot­print run­time” (rts-zfp) that you can find in the GNAT CE pack­age. I’ve mod­i­fied it very slight­ly from the orig­i­nal, so the CuBit repos­i­to­ry has it’s own copy. Much of the work that I’ve put into CuBit is get­ting the oper­at­ing sys­tem to a point where it can sup­port lan­guage and library fea­tures that we’re all used to in our dai­ly devel­op­ment, such as text I/O and mem­o­ry allocation.

Allo­ca­tors #

Since we have to pro­vide all the mem­o­ry allo­ca­tion our­selves, CuBit has 2 phys­i­cal mem­o­ry allo­ca­tors. The first is the BootAllocator. This uses sta­t­ic bitmaps with sup­port for allo­cat­ing a lim­it­ed amount of mem­o­ry using a rel­a­tive­ly slow lin­ear search.

The oth­er is the BuddyAllocator. CuBit’s bud­dy allo­ca­tor is imple­ment­ed as an array of cir­cu­lar­ly-linked lists of free blocks of pow­er-of‑2 mul­ti­ples of the x86-64 small page size (4K). Whew, that was a mouthful!

I know what you’re think­ing: cir­cu­lar­ly-linked lists are pro­hib­it­ed in SPARK.” Well, yes. Here we get around that by over­lay­ing the free list record” on top of the under­ly­ing phys­i­cal mem­o­ry to be allo­cat­ed. When remov­ing the block from the free list (either because it was allo­cat­ed, or because we’re com­bin­ing it with its bud­dy to form a larg­er block), we call the unlink procedure.

In buddyallocator.adb:

procedure unlink(ord : in Order; addr : in Virtmem.PhysAddress) with
        SPARK_Mode => On,
        Pre  => freeLists(ord).numFreeBlocks > 0,
        Post => freeLists(ord).numFreeBlocks =
                freeLists(ord).numFreeBlocks'Old - 1
    is
        block : aliased FreeBlock with
            Import, Address => To_Address(addr);

        prevAddr : constant System.Address := block.prevBlock;
        nextAddr : constant System.Address := block.nextBlock;
    begin

        linkNeighbors:
        declare
            prevBlock : aliased FreeBlock with
                Import, Address => prevAddr;

            nextBlock : aliased FreeBlock with
                Import, Address => nextAddr;
        begin
            prevBlock.nextBlock := nextAddr;
            nextBlock.prevBlock := prevAddr;
        end linkNeighbors;

        -- decrement the free list count when we unlink somebody
        freeLists(ord).numFreeBlocks :=
            freeLists(ord).numFreeBlocks - 1;
    end unlink;

This makes it a bit hard­er to prove cer­tain prop­er­ties about the allo­ca­tor, so might not be con­sid­ered the best tech­nique. I believe its an accept­able com­pro­mise here as it allows us to main­tain SPARK_​Mode in the sub­pro­gram body and allow prov­ing oth­er prop­er­ties (absence of over­flow, etc.) of the code.

CuBit enforces strict align­ment of all block sizes, which caus­es us to sac­ri­fice a lit­tle bit of mem­o­ry to exter­nal frag­men­ta­tion, but it makes the bud­dy cal­cu­la­tions for split­ting and rejoin­ing blocks sim­pler and faster.

For object allo­ca­tion, CuBit uses a curi­ous fea­ture of GNAT called the Simple_Storage_Pool prag­ma. I couldn’t find much doc­u­men­ta­tion about this, but was pleased to dis­cov­er that we need lit­tle-to-no run­time sup­port for this, but can still use the famil­iar new key­word to allo­cate objects at run­time. This might open up the use of tagged records and con­trolled types in CuBit, but needs more exper­i­men­ta­tion before I can say one way or the other.

The SlabAllocator imple­ments a cir­cu­lar­ly-linked list of free objects. It works sim­i­lar­ly to the BuddyAllocator, but instead of page order-aligned blocks of mem­o­ry, it works on objects of a fixed size, with user-spec­i­fied align­ment. By allo­cat­ing from and free-ing to the head of the list, we can get fast O(1) allo­ca­tions and deal­lo­ca­tions. We do give up some flex­i­bil­i­ty with the slab allo­ca­tor, since the object size is fixed and must be spec­i­fied ahead of time.

In SlabAllocator.ads:

type FreeNode is
    record
        next : System.Address;
        prev : System.Address;
    end record with Size => 16 * 8;

    for FreeNode use
    record
        next at 0 range 0..63;
        prev at 8 range 0..63;
    end record;

    type Slab is limited record
        freeList    : FreeNode;

        numFree     : Integer := 0;
        capacity    : Integer := 0;

        blockOrder  : BuddyAllocator.Order;
        blockAddr   : Virtmem.PhysAddress;
        
        mutex       : aliased Spinlock.Spinlock;
        
        alignment   : System.Storage_Elements.Storage_Count;
        paddedSize  : System.Storage_Elements.Storage_Count;
        
        initialized : Boolean := False;
    end record;

    -- GNAT-specific pragma
    pragma Simple_Storage_Pool_Type(Slab);

Use of this allo­ca­tor should look some­what famil­iar. We need to ini­tial­ize the slab allo­ca­tor with a set­up rou­tine that will allo­cate under­ly­ing phys­i­cal mem­o­ry from the bud­dy allo­ca­tor described earlier.

...
    objSlab : SlabAllocator.Slab;
    type myObjPtr is access myObject;
    for myObjPtr'Simple_Storage_Pool use objSlab;

    procedure free is new Ada.Unchecked_Deallocation(myObject, myObjPtr);
    obj : myObjPtr;

begin

    SlabAllocator.setup(objSlab, myObject'Size);
    obj := new Object;
...

Record Rep­re­sen­ta­tion #

Note that I use both Size and a record rep­re­sen­ta­tion clause for the FreeNode record in SlabAllocator. This is overkill here, but gen­er­al­ly I like to use both, espe­cial­ly for records that might be over­laid on mem­o­ry-mapped hard­ware, used direct­ly by the CPU (like page tables), or over­laid on in-mem­o­ry BIOS struc­tures like the ACPI tables. It helps keep me hon­est so that when I add fields to a record the com­pil­er will ensure that the size and rep­re­sen­ta­tion match up. Con­sid­er some­thing like the ACPI FADT structure:

In acpi.ads:

---------------------------------------------------------------------------
    -- FADT - Fixed ACPI Description Table.
    ---------------------------------------------------------------------------
    type FADTRecord is
    record
        header              : SDTRecordHeader;
        firmwareControl     : Unsigned_32;  -- ignored if exFirmwareControl present
        dsdt                : Unsigned_32;  -- ignored if exDsdt present
        reserved1           : Unsigned_8;
        powerMgmtProfile    : PowerManagementProfile;
        sciInterrupt        : Unsigned_16;
        smiCommand          : Unsigned_32;
        acpiEnable          : Unsigned_8;
        acpiDisable         : Unsigned_8;
        S4BIOSReq           : Unsigned_8;
        pStateControl       : Unsigned_8;
        PM1AEventBlock      : Unsigned_32;
        PM1BEventBlock      : Unsigned_32;
        PM1AControlBlock    : Unsigned_32;
        PM1BControlBlock    : Unsigned_32;
        PM2ControlBlock     : Unsigned_32;
        PMTimerBlock        : Unsigned_32;
        GPE0Block           : Unsigned_32;
        GPE1Block           : Unsigned_32;
        PM1EventLength      : Unsigned_8;
        PM1ControlLength    : Unsigned_8;
        PM2ControlLength    : Unsigned_8;
        PMTimerLength       : Unsigned_8;
        GPE0BlockLength     : Unsigned_8;
        GPE1BlockLength     : Unsigned_8;
        GPE1Base            : Unsigned_8;
        cStateControl       : Unsigned_8;
        pLevel2Latency      : Unsigned_16;
        pLevel3Latency      : Unsigned_16;
        flushSize           : Unsigned_16;
        flushStride         : Unsigned_16;
        dutyOffset          : Unsigned_8;
        dutyWidth           : Unsigned_8;
        dayAlarm            : Unsigned_8;
        monthAlarm          : Unsigned_8;
        century             : Unsigned_8;   -- RTC index into RTC RAM if not 0
        intelBootArch       : Unsigned_16;  -- IA-PC boot architecture flags
        reserved2           : Unsigned_8;   -- always 0
        flags               : Unsigned_32;  -- fixed feature flags
        resetRegister       : GenericAddressStructure;
        resetValue          : Unsigned_8;
        armBootArch         : Unsigned_16;
        fadtMinorVersion    : Unsigned_8;
        exFirmwareControl   : Unsigned_64;
        exDsdt              : Unsigned_64;
        exPM1AEventBlock    : GenericAddressStructure;
        exPM1BEventBlock    : GenericAddressStructure;
        exPM1AControlBlock  : GenericAddressStructure;
        exPM1BControlBlock  : GenericAddressStructure;
        exPM2ControlBlock   : GenericAddressStructure;
        exPMTimerBlock      : GenericAddressStructure;
        exGPE0Block         : GenericAddressStructure;
        exGPE1Block         : GenericAddressStructure;

        -- ACPI 6 fields (not supported yet)
        --sleepControlReg     : GenericAddressStructure;
        --sleepStatusReg      : GenericAddressStructure;
        --hypervisorVendor    : Unsigned_64;
    end record with Size => 244*8;

    for FADTRecord use
    record
        header              at 0   range 0..287;
        firmwareControl     at 36  range 0..31;
        dsdt                at 40  range 0..31;
        reserved1           at 44  range 0..7;
        powerMgmtProfile    at 45  range 0..7;
        sciInterrupt        at 46  range 0..15;
        smiCommand          at 48  range 0..31;
        acpiEnable          at 52  range 0..7;
        acpiDisable         at 53  range 0..7;
        S4BIOSReq           at 54  range 0..7;
        pStateControl       at 55  range 0..7;
        PM1AEventBlock      at 56  range 0..31;
        PM1BEventBlock      at 60  range 0..31;
        PM1AControlBlock    at 64  range 0..31;
        PM1BControlBlock    at 68  range 0..31;
        PM2ControlBlock     at 72  range 0..31;
        PMTimerBlock        at 76  range 0..31;
        GPE0Block           at 80  range 0..31;
        GPE1Block           at 84  range 0..31;
        PM1EventLength      at 88  range 0..7;
        PM1ControlLength    at 89  range 0..7;
        PM2ControlLength    at 90  range 0..7;
        PMTimerLength       at 91  range 0..7;
        GPE0BlockLength     at 92  range 0..7;
        GPE1BlockLength     at 93  range 0..7;
        GPE1Base            at 94  range 0..7;
        cStateControl       at 95  range 0..7;
        pLevel2Latency      at 96  range 0..15;
        pLevel3Latency      at 98  range 0..15;
        flushSize           at 100 range 0..15;
        flushStride         at 102 range 0..15;
        dutyOffset          at 104 range 0..7;
        dutyWidth           at 105 range 0..7;
        dayAlarm            at 106 range 0..7;
        monthAlarm          at 107 range 0..7;
        century             at 108 range 0..7;
        intelBootArch       at 109 range 0..15;
        reserved2           at 111 range 0..7;
        flags               at 112 range 0..31;
        resetRegister       at 116 range 0..95;
        resetValue          at 128 range 0..7;
        armBootArch         at 129 range 0..15;
        fadtMinorVersion    at 131 range 0..7;
        exFirmwareControl   at 132 range 0..63;
        exDsdt              at 140 range 0..63;
        exPM1AEventBlock    at 148 range 0..95;
        exPM1BEventBlock    at 160 range 0..95;
        exPM1AControlBlock  at 172 range 0..95;
        exPM1BControlBlock  at 184 range 0..95;
        exPM2ControlBlock   at 196 range 0..95;
        exPMTimerBlock      at 208 range 0..95;
        exGPE0Block         at 220 range 0..95;
        exGPE1Block         at 232 range 0..95;

        -- ACPI 6 fields
        --sleepControlReg     at 244 range 0..95;
        --sleepStatusReg      at 256 range 0..95;
        --hypervisorVendor    at 268 range 0..63;
    end record;

It’s very easy to make mis­takes in big records like this! Thank­ful­ly Ada’s record rep­re­sen­ta­tion claus­es allow us to depend on the field align­ments and sizes that we specify.

Per-CPU Data, Per-CPU Stacks, and Sec­ondary Stacks #

Since CuBit is a mul­ti-core oper­at­ing sys­tem, we need to main­tain sep­a­rate call stacks for each CPU. The mech­a­nism by which this is done is fair­ly prim­i­tive at this point. We just carve out a chunk of mem­o­ry (8 MiB as of the time of writ­ing, but I sus­pect we’ll even­tu­al­ly want more) for the ker­nel stacks (STACK_BOTTOM to STACK_TOP). This is divid­ed up by the max­i­mum num­ber of cores that CuBit is cur­rent­ly set to sup­port (128), so we get a 64k-sized stack for each CPU present.

These are ker­nel-mode stacks, so call chains should be fair­ly short. Each of these stacks is also divid­ed up into the pri­ma­ry stack, which grows down­ward, and then the Ada sec­ondary stack, which grows upward. The sec­ondary stack is not wide­ly used, but it does work. It’s set at a pal­try 2048 bytes for now, in the System.Parameters.Runtime_Default_Sec_Stack_Size run­time package.

We need to set the stack point­er ear­ly on, so that we can call adainit to per­form elab­o­ra­tion, and then our kmain function.

In boot.asm:

...
setup_bsp:
    ; Setup our kernel stack.
    mov rsp, qword (STACK_TOP)

    ; Add a stack canary to bottom of primary stack for CPU #0
    mov rax, qword (STACK_TOP - PER_CPU_STACK_SIZE + SEC_STACK_SIZE)
    mov rbx, 0xBAD_CA11_D37EC7ED
    mov [rax], rbx

    ; Save rdi, rsi so adainit doesn't clobber them
    push rdi
    push rsi

    ; Initialize with adainit for elaboration prior to entering Ada.
    mov rax, qword adainit
    call rax

    ; Restore arguments to kmain
    pop rsi
    pop rdi

    ; call into Ada code
    mov rax, qword kmain
    call rax
...

The sec­ondary stack is then ini­tial­ized with a call to System.Secondary_Stack.SS_Init with the address for the bot­tom of the CPU stack for the boot­strap processor/​CPU #0.

Remem­ber those APs that boot into real-mode? We need to set their ini­tial stack point­ers as well, once they get to 64-bit mode. When we tell those proces­sors to boot, we set a glob­al vari­able startingCPU in the Ada code pri­or to send­ing the inter­rupt that tells them to start.

We can use that to cal­cu­late where that CPU’s per-CPU stack should live in the setup_ap:. Each stack size is (STACK_TOP - STACK_BOTTOM) / MAX_CPUS. Then we can set the stack for CPU N by set­ting its RSP (stack point­er) to STACK_TOP - (startingCPU * stack size).

This rais­es anoth­er issue. Mul­ti-core code can use the pri­ma­ry stack eas­i­ly, since each CPU has its own stack point­er reg­is­ter. How­ev­er, the Ada sec­ondary stack needs to be manip­u­lat­ed using data and a point­er kept in main mem­o­ry. So how do we know which sec­ondary stack to use when we’re run­ning code on an arbi­trary CPU? It is not always obvi­ous which CPU a piece of code is run­ning on in a mul­ti-core system.

CuBit uses a per-CPU data struc­ture. It is imple­ment­ed sim­i­lar to the way that thread-local stor­age is, using the GS reg­is­ter to store an off­set to the per-CPU data. This data struc­ture includes infor­ma­tion like the CPU num­ber it rep­re­sents and the cur­rent­ly run­ning process ID. When code on a CPU attempts to use the sec­ondary stack, it gets a point­er to the ded­i­cat­ed sec­ondary stack for that CPU. This is obvi­ous­ly slow­er than access­ing the RSP reg­is­ter for the pri­ma­ry stack.

In this case, CuBit actu­al­ly keeps the per-CPU data on each of the pri­ma­ry stacks we set up ear­li­er. Gen­er­al­ly speak­ing its bad form to take the address of a stack vari­able, but here we cre­ate the per-CPU data record just pri­or to enter­ing the sched­ul­ing sub­pro­gram which loops infi­nite­ly, so it is safe to access as though it were sta­t­i­cal­ly allocated.

When the sec­ondary stack pack­age goes to allo­cate, it calls __gnat_get_secondary_stack, which is the export name of the getSecondaryStack func­tion in the PerCPUData pack­age. We use this address in SS_Allocate, which is called behind the scenes when­ev­er the sec­ondary stack is used.

Sched­ul­ing and Task­ing #

The CuBit sched­uler is very prim­i­tive and very sim­ple. It just uses a basic round-robin mod­el for now, with a lot of work to be done for var­i­ous process states like sleep­ing, etc. This is cur­rent­ly where I’m spend­ing a lot of my time on CuBit. Con­text switch­ing is the­o­ret­i­cal­ly just a mat­ter of sav­ing one reg­is­ter set and restor­ing anoth­er, and return­ing to the saved return address on the stack. x86 makes this a bit more chal­leng­ing, and the details are messy. I’m still work­ing through a lot of the fin­er points, but it most­ly works. As of the time of writ­ing, there’s a small user-mode stub in init.asm. This is the first user-mode process, which will even­tu­al­ly per­form the sys­tem call to exe­cute what­ev­er init frame­work is used.

SPARK can ver­i­fy con­cur­rent pro­grams using the Raven­scar task­ing mod­el. This is ide­al for bare-met­al appli­ca­tions where the tasks are known ahead of time. How­ev­er, in a gen­er­al-pur­pose OS, we don’t know what appli­ca­tions are going to be run, nor should we place unnec­es­sary restric­tions on them. There might be a good way to mod­el user process­es as Ada tasks in order to get some guar­an­tees from the com­pil­er, but I haven’t worked out a way to do this, and it may in fact not be feasible.

The dif­fi­cul­ty, as far as I can tell, lies in describ­ing which parts of the code might be run­ning con­cur­rent­ly and mod­el­ing their inter­ac­tions. Between mul­ti­ple cores, sys­tem calls, and inter­rupts, it gets hairy very quick­ly. Care­ful use of lock­ing prim­i­tives is the typ­i­cal route for deal­ing with this, but ensur­ing the absence of dead­lock is dif­fi­cult. If there are any SPARK experts read­ing this with some good ideas in this area, please let me know!

The Future #

CuBit is still in a very ear­ly stage of devel­op­ment. It’s just been a side-project of mine for now, and frankly I’m a lit­tle embar­rassed to write about it in it’s cur­rent state! I sup­pose they say if you aren’t embar­rassed by your first release than you’ve wait­ed too long to release”. I wel­come con­tri­bu­tions from the wider Ada/​SPARK com­mu­ni­ty. If, like me, you dream about for­mal­ly-ver­i­fied soft­ware tak­ing over the world some day, then please check it out at https://​github​.com/​d​o​c​a​n​d​r​e​w​/​C​uBit/ and start sub­mit­ting those pull requests!

Posted in

About Jon Andrew

Jon Andrew

Jon Andrew is a Senior DevOps Engineer at B.E.A.T. LLC. Jon earned his degree in Computer Engineering from the University of Arizona and specialized in software engineering before joining the US Air Force, where as a pilot he flew special operations aircraft and instructed new student pilots. He has since earned his Masters in Cybersecurity through Syracuse University and works on a variety of defense and cybersecurity software engineering projects.