AdaCore Blog

CuBit: A General-Purpose Operating System in SPARK/Ada

by Jon Andrew

CuBit: A General-Purpose Operating System in SPARK/Ada #

Introduction #

Last year, I started evaluating programming languages for a formally-verified operating system. I've been developing software for a while, but only recently began work in high integrity software development and formal methods. There are several operating system projects, like the SeL4 microkernel and the Muen separation kernel, that make use of formal verification. But I was interested in using a formally-verified language to write a general-purpose OS - an environment for abstracting the underlying hardware while acting as an arbiter for running the normal applications we're used to.

Many of the languages used to write formally-verified software rely on a runtime environment or make heavy use of garbage-collection, which makes them difficult to use for operating system development. SPARK, with its support for zero-footprint runtimes and embedded devices, seemed to fit the bill. Ada's rich type system has a lot to offer the operating system developer, and there is good support for writing bare-metal code.

Overview #

CuBit is still in a very early stage of development. It boots into 64-bit mode on the x86-64, runs in the higher-half of memory, and is multicore (up to 128 logical processors) with preemptive multitasking. There are virtual memory object and physical memory allocators, and CuBit can start primitive user-mode processes with support for the fast x86-64 SYSCALL instruction for getting back into kernel mode. Filesystem drivers and VFS aren't ready yet, but progress is being made in this area. My intent with CuBit is not to create a formally-verified Unix clone. I've taken inspiration from a number of different operating systems, from embedded OSes like Xinu and QNX to BSD, Linux and even Windows and VMS.

Having said all that, CuBit doesn't really do all that much yet! But I've had a lot of fun and learned a lot developing what is there so far. I've spent a lot of time on documentation and trying to make the code easy for contributors - so let's dive in! Code is available at https://github.com/docandrew/CuBit.

First Steps #

Luke Guest's Ada Bare Bones article on osdev.org (https://wiki.osdev.org/Ada_Bare_bones) does a good job of illustrating some of the limitations that must be considered when developing an operating system in Ada, and was the starting point I used. One of the topics he devotes some time to is the use of pragma to disable Ada features that aren't suitable for OS code. I use the gnat.adc file to list them all. Many of the pragmas I specify in gnat.adc are self-explanatory, but it's worth talking about a few.

In gnat.adc:

pragma Suppress (Index_Check);
pragma Suppress (Range_Check);
pragma Suppress (Overflow_Check);
...
pragma Restrictions (No_Floating_Point);

I suppress the Index, Range and Overflow checks because we should use gnatprove at compile-time to demonstrate the absence of these types of errors. The beauty of SPARK is that we can show the absence of these errors at compile-time rather than rely on runtime exceptions. Now, just because we can prove absence of these errors doesn't mean CuBit is there yet. There is still a lot of work to be done in the proof department. There's a balance between adding new features to the OS and going back and trying to verify the work that was already done. Since this is a side-project for now, I find that usually new features win my attention. Having said that, I do enjoy going back, picking a file, and trying to get it to pass all of gnatprove's checks with no warnings or errors.

The next pragma disables floating-point math. The x87 FPU has to be explicitly initialized prior to use. Depending on where we do that in the OS (CuBit doesn't do it at all, yet!), we might get ourselves into trouble by accidentally performing a floating-point operation which would cause a hardware exception. This pragma prevents us from doing so.

Generally-speaking, use of floating-point in the OS is to be avoided anyhow, as it means that during a context switch (entering the OS from user code or vice-versa), we have to save the floating-point state of the user process and restore the kernel's floating-point state. If the user process doesn't make use of the FPU, then we can gain some efficiency by avoiding it in the kernel as well. I'm hoping CuBit can avoid floating-point math altogether in the kernel, but it remains to be seen whether that's possible long-term.

Since CuBit runs in 64-bit mode, we also have to consider SSE instructions. These are a little bit sneakier. Unlike floating-point operations, which are very easy to avoid using Ada's modular types, the GCC code generator can put SSE instructions in where you don't expect them for better performance. Since CuBit doesn't yet support SSE, we need to explicitly disable it with compiler flags in the .gpr project file:

In cubit.gpr:

package Compiler is
    for Default_Switches ("Ada") use
    (
        ...
        "-mno-sse"
        "-mno-sse2"
        ...
    );
end Compiler;

That one was learned the hard way, when I got an unexpected hardware exception and had to dig through the generated assembly to see what the problem was. Speaking of assembly...

Some Assembly Required #

In order to get to our SPARK code, a little bit of assembly code is necessary. I prefer Intel syntax vs the AT&T syntax commonly used in GCC, so use the yasm assembler in CuBit. Inline assembly in GNAT is still using the native AT&T-style syntax, so it can be seen in CuBit too, when we need to use native instructions from Ada for specific hardware functionality, like reading I/O ports, or an x86 model-specific register, shown here.

In x86.adb:

---------------------------------------------------------------------------
-- Read from a model-specific register (MSR)
---------------------------------------------------------------------------
function rdmsr(msraddr : in MSR) return Unsigned_64
is
    low : Unsigned_32;
    high : Unsigned_32;
begin
    Asm("rdmsr",
        Inputs  =>  Unsigned_32'Asm_Input("c", msraddr),
        Outputs => (Unsigned_32'Asm_Output("=a", low),
                    Unsigned_32'Asm_Output("=d", high)),
        Volatile => True);
    return (Shift_Left(Unsigned_64(high), 32) or Unsigned_64(low));
end rdmsr;

We aren't allowed to use inline-assembly in SPARK-Mode, so we've marked the whole x86 package body with SPARK_Mode => Off. However, we can still make use of SPARK to verify pre-conditions and post-conditions for these functions in the package specification. This is a technique that CuBit uses quite often, as we often have to do pointer arithmetic, unchecked conversions and what John Barnes calls "naughty peeking and poking" within a subprogram body.

Linking #

One interesting thing about CuBit and many operating systems is that the kernel is just an ELF object file, with the typical sections we'd expect in a normal application - like .text, .rodata, .data, .bss, etc. We need to be a little more careful about how these sections are linked (i.e. what address to use when referring to a symbol). CuBit uses a linker script to specify exactly where the sections should be linked, and then also provide a load address using the AT keyword to tell the bootloader where in physical memory to load it.

Since CuBit is 64-bit, we link the kernel in the upper 2GB of memory, and use the -mcmodel=kernel compiler flag to make sure the instructions generated by GNAT are using the appropriate addressing mode. With a multicore architecture, we consider the "bootstrap" core or BSP and then "application processors" or APs.

The APs will boot into real mode once we send them the appropriate interrupt, so the code there must be both physically in the first 64k of memory, as well as linked there. After some initial setup to get to protected mode, they boot in a similar manner to the BSP. We rename this section to .text_ap to distinguish it from the normal kernel .text section.

In linker.ld:

/* AP starting point */
AP_START        = 0x7000;

/* kernel load and link locations */
KERNEL_PHYS     = 0x00100000;
KERNEL_BASE     = 0xFFFFFFFF80000000;

...
SECTIONS
{
    . = AP_START;

    .text_ap : AT(AP_START) {
        stext_ap = .;
        *(.text_ap_entry)
        etext_ap = .;
    }

	. = KERNEL_PHYS + KERNEL_BASE;

    KERNEL_START_VIRT = .;

    .text : AT(KERNEL_PHYS)
    {
        stext = .;
        build/boot.o (.text .text.*)    /* need this at the front */
        *( EXCLUDE_FILE(build/init.o) .text .text.*)
    }
    
    . = ALIGN(4K);
    etext = .;

    .rodata :
    {
        srodata = .;
        *(.rodata .rodata.*)
    }
...

If you've never worked with linker scripts, this might look a little funny. Just remember that the . means the "current address". So starting at:

. = AP_START

we could say, "the current address is AP_START (or 0x7000)." Then, from that address, we define a section

.text_ap : AT(AP_START) {
    stext_ap = .;
    *(.text_ap_entry)
    etext_ap = .;
}

This means link the text_ap section starting from whatever the current address is, and then AT(AP_START) puts a load address of AP_START in the object file, which our bootloader will dutifully abide by. Note that there are what appear to be a couple of assignments: stext_ap = .; and etext_ap = .;. These symbols will get exported for use in our Ada code. However, we must always remember to take the address of a symbol, never refer to it directly. To ensure this, CuBit uses the Ada type system to help out:

In util.ads:

    ---------------------------------------------------------------------------
    -- Symbol is a useless type, used to prevent us from forgetting to use
    -- 'Address when referring to one.
    ---------------------------------------------------------------------------
    type Symbol is (USELESS) with Size => System.Word_Size;

In-between the stext_ap = .; and etext_ap = .; symbol exports, we have the cream filling of the .text_ap section. Here's where we specify what gets included in that section. In this case, it's everything marked as a .text_ap_entry section within individual object files - which in this case is just the compiled boot_ap.asm or AP entry code. We use the section directive in our assembly to specify what the object file section will be called:

In boot_ap.asm:

...

BITS 16

; we'll link this section down low, since it has to be in first
; 65535 bytes for real mode.
section .text_ap_entry
...

If we take a look at the object file with readelf we can see what's going on:

> readelf -a build/boot_ap.o
...
Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .shstrtab         STRTAB           0000000000000000  00000320
       000000000000009e  0000000000000000           0     0     0
  [ 2] .strtab           STRTAB           0000000000000000  000003c0
       000000000000009d  0000000000000000           0     0     0
  [ 3] .symtab           SYMTAB           0000000000000000  00000460
       0000000000000198  0000000000000018           2    15     8
  [ 4] .text             PROGBITS         0000000000000000  00000040
       0000000000000000  0000000000000000  AX       0     0     16
  [ 5] .text_ap_entry    PROGBITS         0000000000000000  00000040
       000000000000008e  0000000000000000   A       0     0     16
...

The .text_ap_entry section shown here will get placed in the .text_ap section we defined in our linker script. It seems yasm gives us a .text section we didn't ask for, but it's size 0, as expected, since boot_ap.asm does not specify a .text section.

We export some other useful symbols in the linker file too - like stext, etext, srodata, erodata, etc. This is just to mark the start and end of the various sections in our kernel image. We'll return to those in just a bit.

The other sections, for read-only data (.rodata), initialized static data (.data) and uninitialized static data (.bss) follow after the .text section. Note we set the "current address" using the ALIGN(4K) directives. I like to have these page-aligned (4k when using small pages on x86-64) for a couple of reasons. The first, less-important reason is that it makes debugging easier when I can quickly identify the range of memory that might be causing an error.

The second, more-important reason is that we can apply per-page access protections. For instance, marking the "no-execute" (NX) bit for data sections, and marking .rodata as read-only. In doing so, if we make a mistake and access something we shouldn't, we'll hopefully get a (easily debuggable) hardware exception rather than (hard to debug) undefined behavior. This is also a security enhancement.

This is performed in the Mem_Mgr package, when we map all physical memory in the system into the kernel's page tables with the appropriate flags. Here, Ada's type system helps us out again, since we can define ranges with the section start and end addresses we exported in the linker script:

In mem_mgr.ads:

...
    subtype kernelTextPages is Virtmem.PFN range
        Virtmem.addrToPFN(Virtmem.K2P(To_Integer(Virtmem.stext'Address))) .. 
        Virtmem.addrToPFN(Virtmem.K2P(To_Integer(Virtmem.etext'Address) - 1));

    subtype kernelROPages is Virtmem.PFN range
        Virtmem.addrToPFN(Virtmem.K2P(To_Integer(Virtmem.srodata'Address))) .. 
        Virtmem.addrToPFN(Virtmem.K2P(To_Integer(Virtmem.erodata'Address) - 1));

    subtype kernelRWPages is Virtmem.PFN range
        Virtmem.addrToPFN(Virtmem.K2P(To_Integer(Virtmem.sdata'Address))) .. 
        Virtmem.addrToPFN(Virtmem.K2P(To_Integer(Virtmem.ebss'Address) - 1));
...

This looks a bit unwieldy, but we're just defining Ada ranges for each ELF section, converting them to their physical address, and then to a PFN or "page frame number" to make them easier to work with. Later, we'll iterate through all the memory areas that the bootloader identifies, mapping pages appropriately. Ada makes it easy to check whether the page frame we're looking at belongs to one of those ranges:

In mem_mgr.adb:

    procedure determineFlagsAndMapFrame(frame : in Virtmem.PFN) is
...
    begin

        if frame in kernelTextPages then
            mapPage(fromPhys, 
                    toVirtLinear,
                    Virtmem.PG_KERNELDATARO,
                    kernelP4,
                    ok);

            if not ok then raise RemapException; end if;

            mapPage(fromPhys,
                    toVirtKernel,
                    Virtmem.PG_KERNELCODE,
                    kernelP4,
                    ok);
...

Here, if the PFN falls into the kernelTextPages range, then we'll map it into virtual memory twice. Once as read-only data (PG_KERNELDATARO) in what I call the "linear range", at 0xFFFF_8000_0000_0000 and up, where we map all physical memory in the system, and then again as in the upper 2GiB of memory at 0xFFFF_FFFF_8000_0000 and up, where it was linked and executes from. At boot, CuBit will display the memory addresses of the sections as they are linked:

...
Kernel sections:
text:     0xFFFFFFFF80100000-0xFFFFFFFF80127000
rodata:   0xFFFFFFFF80127000-0xFFFFFFFF8012B000
data:     0xFFFFFFFF8012B000-0xFFFFFFFF8012C000
bss:      0xFFFFFFFF8012C000-0xFFFFFFFF804300A8
...

Modifying the Runtime #

SPARK comes with a "zero footprint runtime" (rts-zfp) that you can find in the GNAT CE package. I've modified it very slightly from the original, so the CuBit repository has it's own copy. Much of the work that I've put into CuBit is getting the operating system to a point where it can support language and library features that we're all used to in our daily development, such as text I/O and memory allocation.

Allocators #

Since we have to provide all the memory allocation ourselves, CuBit has 2 physical memory allocators. The first is the BootAllocator. This uses static bitmaps with support for allocating a limited amount of memory using a relatively slow linear search.

The other is the BuddyAllocator. CuBit's buddy allocator is implemented as an array of circularly-linked lists of free blocks of power-of-2 multiples of the x86-64 small page size (4K). Whew, that was a mouthful!

I know what you're thinking: "circularly-linked lists are prohibited in SPARK." Well, yes. Here we get around that by overlaying the "free list record" on top of the underlying physical memory to be allocated. When removing the block from the free list (either because it was allocated, or because we're combining it with its buddy to form a larger block), we call the unlink procedure.

In buddyallocator.adb:

    procedure unlink(ord : in Order; addr : in Virtmem.PhysAddress) with
        SPARK_Mode => On,
        Pre  => freeLists(ord).numFreeBlocks > 0,
        Post => freeLists(ord).numFreeBlocks =
                freeLists(ord).numFreeBlocks'Old - 1
    is
        block : aliased FreeBlock with
            Import, Address => To_Address(addr);

        prevAddr : constant System.Address := block.prevBlock;
        nextAddr : constant System.Address := block.nextBlock;
    begin

        linkNeighbors:
        declare
            prevBlock : aliased FreeBlock with
                Import, Address => prevAddr;

            nextBlock : aliased FreeBlock with
                Import, Address => nextAddr;
        begin
            prevBlock.nextBlock := nextAddr;
            nextBlock.prevBlock := prevAddr;
        end linkNeighbors;

        -- decrement the free list count when we unlink somebody
        freeLists(ord).numFreeBlocks :=
            freeLists(ord).numFreeBlocks - 1;
    end unlink;

This makes it a bit harder to prove certain properties about the allocator, so might not be considered the best technique. I believe its an acceptable compromise here as it allows us to maintain SPARK_Mode in the subprogram body and allow proving other properties (absence of overflow, etc.) of the code.

CuBit enforces strict alignment of all block sizes, which causes us to sacrifice a little bit of memory to external fragmentation, but it makes the buddy calculations for splitting and rejoining blocks simpler and faster.

For object allocation, CuBit uses a curious feature of GNAT called the Simple_Storage_Pool pragma. I couldn't find much documentation about this, but was pleased to discover that we need little-to-no runtime support for this, but can still use the familiar new keyword to allocate objects at runtime. This might open up the use of tagged records and controlled types in CuBit, but needs more experimentation before I can say one way or the other.

The SlabAllocator implements a circularly-linked list of free objects. It works similarly to the BuddyAllocator, but instead of page order-aligned blocks of memory, it works on objects of a fixed size, with user-specified alignment. By allocating from and free-ing to the head of the list, we can get fast O(1) allocations and deallocations. We do give up some flexibility with the slab allocator, since the object size is fixed and must be specified ahead of time.

In SlabAllocator.ads:

    type FreeNode is
    record
        next : System.Address;
        prev : System.Address;
    end record with Size => 16 * 8;

    for FreeNode use
    record
        next at 0 range 0..63;
        prev at 8 range 0..63;
    end record;

    type Slab is limited record
        freeList    : FreeNode;

        numFree     : Integer := 0;
        capacity    : Integer := 0;

        blockOrder  : BuddyAllocator.Order;
        blockAddr   : Virtmem.PhysAddress;
        
        mutex       : aliased Spinlock.Spinlock;
        
        alignment   : System.Storage_Elements.Storage_Count;
        paddedSize  : System.Storage_Elements.Storage_Count;
        
        initialized : Boolean := False;
    end record;

    -- GNAT-specific pragma
    pragma Simple_Storage_Pool_Type(Slab);

Use of this allocator should look somewhat familiar. We need to initialize the slab allocator with a setup routine that will allocate underlying physical memory from the buddy allocator described earlier.

...
    objSlab : SlabAllocator.Slab;
    type myObjPtr is access myObject;
    for myObjPtr'Simple_Storage_Pool use objSlab;

    procedure free is new Ada.Unchecked_Deallocation(myObject, myObjPtr);
    obj : myObjPtr;

begin

    SlabAllocator.setup(objSlab, myObject'Size);
    obj := new Object;
...

Record Representation #

Note that I use both Size and a record representation clause for the FreeNode record in SlabAllocator. This is overkill here, but generally I like to use both, especially for records that might be overlaid on memory-mapped hardware, used directly by the CPU (like page tables), or overlaid on in-memory BIOS structures like the ACPI tables. It helps keep me honest so that when I add fields to a record the compiler will ensure that the size and representation match up. Consider something like the ACPI FADT structure:

In acpi.ads:

    ---------------------------------------------------------------------------
    -- FADT - Fixed ACPI Description Table.
    ---------------------------------------------------------------------------
    type FADTRecord is
    record
        header              : SDTRecordHeader;
        firmwareControl     : Unsigned_32;  -- ignored if exFirmwareControl present
        dsdt                : Unsigned_32;  -- ignored if exDsdt present
        reserved1           : Unsigned_8;
        powerMgmtProfile    : PowerManagementProfile;
        sciInterrupt        : Unsigned_16;
        smiCommand          : Unsigned_32;
        acpiEnable          : Unsigned_8;
        acpiDisable         : Unsigned_8;
        S4BIOSReq           : Unsigned_8;
        pStateControl       : Unsigned_8;
        PM1AEventBlock      : Unsigned_32;
        PM1BEventBlock      : Unsigned_32;
        PM1AControlBlock    : Unsigned_32;
        PM1BControlBlock    : Unsigned_32;
        PM2ControlBlock     : Unsigned_32;
        PMTimerBlock        : Unsigned_32;
        GPE0Block           : Unsigned_32;
        GPE1Block           : Unsigned_32;
        PM1EventLength      : Unsigned_8;
        PM1ControlLength    : Unsigned_8;
        PM2ControlLength    : Unsigned_8;
        PMTimerLength       : Unsigned_8;
        GPE0BlockLength     : Unsigned_8;
        GPE1BlockLength     : Unsigned_8;
        GPE1Base            : Unsigned_8;
        cStateControl       : Unsigned_8;
        pLevel2Latency      : Unsigned_16;
        pLevel3Latency      : Unsigned_16;
        flushSize           : Unsigned_16;
        flushStride         : Unsigned_16;
        dutyOffset          : Unsigned_8;
        dutyWidth           : Unsigned_8;
        dayAlarm            : Unsigned_8;
        monthAlarm          : Unsigned_8;
        century             : Unsigned_8;   -- RTC index into RTC RAM if not 0
        intelBootArch       : Unsigned_16;  -- IA-PC boot architecture flags
        reserved2           : Unsigned_8;   -- always 0
        flags               : Unsigned_32;  -- fixed feature flags
        resetRegister       : GenericAddressStructure;
        resetValue          : Unsigned_8;
        armBootArch         : Unsigned_16;
        fadtMinorVersion    : Unsigned_8;
        exFirmwareControl   : Unsigned_64;
        exDsdt              : Unsigned_64;
        exPM1AEventBlock    : GenericAddressStructure;
        exPM1BEventBlock    : GenericAddressStructure;
        exPM1AControlBlock  : GenericAddressStructure;
        exPM1BControlBlock  : GenericAddressStructure;
        exPM2ControlBlock   : GenericAddressStructure;
        exPMTimerBlock      : GenericAddressStructure;
        exGPE0Block         : GenericAddressStructure;
        exGPE1Block         : GenericAddressStructure;

        -- ACPI 6 fields (not supported yet)
        --sleepControlReg     : GenericAddressStructure;
        --sleepStatusReg      : GenericAddressStructure;
        --hypervisorVendor    : Unsigned_64;
    end record with Size => 244*8;

    for FADTRecord use
    record
        header              at 0   range 0..287;
        firmwareControl     at 36  range 0..31;
        dsdt                at 40  range 0..31;
        reserved1           at 44  range 0..7;
        powerMgmtProfile    at 45  range 0..7;
        sciInterrupt        at 46  range 0..15;
        smiCommand          at 48  range 0..31;
        acpiEnable          at 52  range 0..7;
        acpiDisable         at 53  range 0..7;
        S4BIOSReq           at 54  range 0..7;
        pStateControl       at 55  range 0..7;
        PM1AEventBlock      at 56  range 0..31;
        PM1BEventBlock      at 60  range 0..31;
        PM1AControlBlock    at 64  range 0..31;
        PM1BControlBlock    at 68  range 0..31;
        PM2ControlBlock     at 72  range 0..31;
        PMTimerBlock        at 76  range 0..31;
        GPE0Block           at 80  range 0..31;
        GPE1Block           at 84  range 0..31;
        PM1EventLength      at 88  range 0..7;
        PM1ControlLength    at 89  range 0..7;
        PM2ControlLength    at 90  range 0..7;
        PMTimerLength       at 91  range 0..7;
        GPE0BlockLength     at 92  range 0..7;
        GPE1BlockLength     at 93  range 0..7;
        GPE1Base            at 94  range 0..7;
        cStateControl       at 95  range 0..7;
        pLevel2Latency      at 96  range 0..15;
        pLevel3Latency      at 98  range 0..15;
        flushSize           at 100 range 0..15;
        flushStride         at 102 range 0..15;
        dutyOffset          at 104 range 0..7;
        dutyWidth           at 105 range 0..7;
        dayAlarm            at 106 range 0..7;
        monthAlarm          at 107 range 0..7;
        century             at 108 range 0..7;
        intelBootArch       at 109 range 0..15;
        reserved2           at 111 range 0..7;
        flags               at 112 range 0..31;
        resetRegister       at 116 range 0..95;
        resetValue          at 128 range 0..7;
        armBootArch         at 129 range 0..15;
        fadtMinorVersion    at 131 range 0..7;
        exFirmwareControl   at 132 range 0..63;
        exDsdt              at 140 range 0..63;
        exPM1AEventBlock    at 148 range 0..95;
        exPM1BEventBlock    at 160 range 0..95;
        exPM1AControlBlock  at 172 range 0..95;
        exPM1BControlBlock  at 184 range 0..95;
        exPM2ControlBlock   at 196 range 0..95;
        exPMTimerBlock      at 208 range 0..95;
        exGPE0Block         at 220 range 0..95;
        exGPE1Block         at 232 range 0..95;

        -- ACPI 6 fields
        --sleepControlReg     at 244 range 0..95;
        --sleepStatusReg      at 256 range 0..95;
        --hypervisorVendor    at 268 range 0..63;
    end record;

It's very easy to make mistakes in big records like this! Thankfully Ada's record representation clauses allow us to depend on the field alignments and sizes that we specify.

Per-CPU Data, Per-CPU Stacks, and Secondary Stacks #

Since CuBit is a multi-core operating system, we need to maintain separate call stacks for each CPU. The mechanism by which this is done is fairly primitive at this point. We just carve out a chunk of memory (8 MiB as of the time of writing, but I suspect we'll eventually want more) for the kernel stacks (STACK_BOTTOM to STACK_TOP). This is divided up by the maximum number of cores that CuBit is currently set to support (128), so we get a 64k-sized stack for each CPU present.

These are kernel-mode stacks, so call chains should be fairly short. Each of these stacks is also divided up into the primary stack, which grows downward, and then the Ada secondary stack, which grows upward. The secondary stack is not widely used, but it does work. It's set at a paltry 2048 bytes for now, in the System.Parameters.Runtime_Default_Sec_Stack_Size runtime package.

We need to set the stack pointer early on, so that we can call adainit to perform elaboration, and then our kmain function.

In boot.asm:

...
setup_bsp:
    ; Setup our kernel stack.
    mov rsp, qword (STACK_TOP)

    ; Add a stack canary to bottom of primary stack for CPU #0
    mov rax, qword (STACK_TOP - PER_CPU_STACK_SIZE + SEC_STACK_SIZE)
    mov rbx, 0xBAD_CA11_D37EC7ED
    mov [rax], rbx

    ; Save rdi, rsi so adainit doesn't clobber them
    push rdi
    push rsi

    ; Initialize with adainit for elaboration prior to entering Ada.
    mov rax, qword adainit
    call rax

    ; Restore arguments to kmain
    pop rsi
    pop rdi

    ; call into Ada code
    mov rax, qword kmain
    call rax
...

The secondary stack is then initialized with a call to System.Secondary_Stack.SS_Init with the address for the bottom of the CPU stack for the bootstrap processor/CPU #0.

Remember those APs that boot into real-mode? We need to set their initial stack pointers as well, once they get to 64-bit mode. When we tell those processors to boot, we set a global variable startingCPU in the Ada code prior to sending the interrupt that tells them to start.

We can use that to calculate where that CPU's per-CPU stack should live in the setup_ap:. Each stack size is (STACK_TOP - STACK_BOTTOM) / MAX_CPUS. Then we can set the stack for CPU N by setting its RSP (stack pointer) to STACK_TOP - (startingCPU * stack size).

This raises another issue. Multi-core code can use the primary stack easily, since each CPU has its own stack pointer register. However, the Ada secondary stack needs to be manipulated using data and a pointer kept in main memory. So how do we know which secondary stack to use when we're running code on an arbitrary CPU? It is not always obvious which CPU a piece of code is running on in a multi-core system.

CuBit uses a per-CPU data structure. It is implemented similar to the way that thread-local storage is, using the GS register to store an offset to the per-CPU data. This data structure includes information like the CPU number it represents and the currently running process ID. When code on a CPU attempts to use the secondary stack, it gets a pointer to the dedicated secondary stack for that CPU. This is obviously slower than accessing the RSP register for the primary stack.

In this case, CuBit actually keeps the per-CPU data on each of the primary stacks we set up earlier. Generally speaking its bad form to take the address of a stack variable, but here we create the per-CPU data record just prior to entering the scheduling subprogram which loops infinitely, so it is safe to access as though it were statically allocated.

When the secondary stack package goes to allocate, it calls __gnat_get_secondary_stack, which is the export name of the getSecondaryStack function in the PerCPUData package. We use this address in SS_Allocate, which is called behind the scenes whenever the secondary stack is used.

Scheduling and Tasking #

The CuBit scheduler is very primitive and very simple. It just uses a basic round-robin model for now, with a lot of work to be done for various process states like sleeping, etc. This is currently where I'm spending a lot of my time on CuBit. Context switching is theoretically just a matter of saving one register set and restoring another, and returning to the saved return address on the stack. x86 makes this a bit more challenging, and the details are messy. I'm still working through a lot of the finer points, but it mostly works. As of the time of writing, there's a small user-mode stub in init.asm. This is the first user-mode process, which will eventually perform the system call to execute whatever init framework is used.

SPARK can verify concurrent programs using the Ravenscar tasking model. This is ideal for bare-metal applications where the tasks are known ahead of time. However, in a general-purpose OS, we don't know what applications are going to be run, nor should we place unnecessary restrictions on them. There might be a good way to model user processes as Ada tasks in order to get some guarantees from the compiler, but I haven't worked out a way to do this, and it may in fact not be feasible.

The difficulty, as far as I can tell, lies in describing which parts of the code might be running concurrently and modeling their interactions. Between multiple cores, system calls, and interrupts, it gets hairy very quickly. Careful use of locking primitives is the typical route for dealing with this, but ensuring the absence of deadlock is difficult. If there are any SPARK experts reading this with some good ideas in this area, please let me know!

The Future #

CuBit is still in a very early stage of development. It's just been a side-project of mine for now, and frankly I'm a little embarrassed to write about it in it's current state! I suppose they say "if you aren't embarrassed by your first release than you've waited too long to release". I welcome contributions from the wider Ada/SPARK community. If, like me, you dream about formally-verified software taking over the world some day, then please check it out at https://github.com/docandrew/CuBit/ and start submitting those pull requests!

Posted in

About Jon Andrew

Jon Andrew is a Senior DevOps Engineer at B.E.A.T. LLC. Jon earned his degree in Computer Engineering from the University of Arizona and specialized in software engineering before joining the US Air Force, where as a pilot he flew special operations aircraft and instructed new student pilots. He has since earned his Masters in Cybersecurity through Syracuse University and works on a variety of defense and cybersecurity software engineering projects.