CuBit: A General-Purpose Operating System in SPARK/Ada
by Jon Andrew (B.E.A.T. LLC.) –
CuBit: A General-Purpose Operating System in SPARK/Ada #
Introduction #
Last year, I started evaluating programming languages for a formally-verified operating system. I’ve been developing software for a while, but only recently began work in high integrity software development and formal methods. There are several operating system projects, like the SeL4 microkernel and the Muen separation kernel, that make use of formal verification. But I was interested in using a formally-verified language to write a general-purpose OS — an environment for abstracting the underlying hardware while acting as an arbiter for running the normal applications we’re used to.
Many of the languages used to write formally-verified software rely on a runtime environment or make heavy use of garbage-collection, which makes them difficult to use for operating system development. SPARK, with its support for zero-footprint runtimes and embedded devices, seemed to fit the bill. Ada’s rich type system has a lot to offer the operating system developer, and there is good support for writing bare-metal code.
Overview #
CuBit is still in a very early stage of development. It boots into 64-bit mode on the x86-64, runs in the higher-half of memory, and is multicore (up to 128 logical processors) with preemptive multitasking. There are virtual memory object and physical memory allocators, and CuBit can start primitive user-mode processes with support for the fast x86-64 SYSCALL instruction for getting back into kernel mode. Filesystem drivers and VFS aren’t ready yet, but progress is being made in this area. My intent with CuBit is not to create a formally-verified Unix clone. I’ve taken inspiration from a number of different operating systems, from embedded OSes like Xinu and QNX to BSD, Linux and even Windows and VMS.
Having said all that, CuBit doesn’t really do all that much yet! But I’ve had a lot of fun and learned a lot developing what is there so far. I’ve spent a lot of time on documentation and trying to make the code easy for contributors — so let’s dive in! Code is available at https://github.com/docandrew/CuBit.
First Steps #
Luke Guest’s Ada Bare Bones article on osdev.org (https://wiki.osdev.org/Ada_Bare_bones) does a good job of illustrating some of the limitations that must be considered when developing an operating system in Ada, and was the starting point I used. One of the topics he devotes some time to is the use of pragma
to disable Ada features that aren’t suitable for OS code. I use the gnat.adc
file to list them all. Many of the pragmas I specify in gnat.adc
are self-explanatory, but it’s worth talking about a few.
In gnat.adc
:
pragma Suppress (Index_Check);
pragma Suppress (Range_Check);
pragma Suppress (Overflow_Check);
...
pragma Restrictions (No_Floating_Point);
I suppress the Index, Range and Overflow checks because we should use gnatprove
at compile-time to demonstrate the absence of these types of errors. The beauty of SPARK is that we can show the absence of these errors at compile-time rather than rely on runtime exceptions. Now, just because we can prove absence of these errors doesn’t mean CuBit is there yet. There is still a lot of work to be done in the proof department. There’s a balance between adding new features to the OS and going back and trying to verify the work that was already done. Since this is a side-project for now, I find that usually new features win my attention. Having said that, I do enjoy going back, picking a file, and trying to get it to pass all of gnatprove
’s checks with no warnings or errors.
The next pragma disables floating-point math. The x87 FPU has to be explicitly initialized prior to use. Depending on where we do that in the OS (CuBit doesn’t do it at all, yet!), we might get ourselves into trouble by accidentally performing a floating-point operation which would cause a hardware exception. This pragma prevents us from doing so.
Generally-speaking, use of floating-point in the OS is to be avoided anyhow, as it means that during a context switch (entering the OS from user code or vice-versa), we have to save the floating-point state of the user process and restore the kernel’s floating-point state. If the user process doesn’t make use of the FPU, then we can gain some efficiency by avoiding it in the kernel as well. I’m hoping CuBit can avoid floating-point math altogether in the kernel, but it remains to be seen whether that’s possible long-term.
Since CuBit runs in 64-bit mode, we also have to consider SSE instructions. These are a little bit sneakier. Unlike floating-point operations, which are very easy to avoid using Ada’s modular types, the GCC code generator can put SSE instructions in where you don’t expect them for better performance. Since CuBit doesn’t yet support SSE, we need to explicitly disable it with compiler flags in the .gpr project file:
In cubit.gpr
:
package Compiler is
for Default_Switches ("Ada") use
(
...
"-mno-sse"
"-mno-sse2"
...
);
end Compiler;
That one was learned the hard way, when I got an unexpected hardware exception and had to dig through the generated assembly to see what the problem was. Speaking of assembly…
Some Assembly Required #
In order to get to our SPARK code, a little bit of assembly code is necessary. I prefer Intel syntax vs the AT&T syntax commonly used in GCC, so use the yasm
assembler in CuBit. Inline assembly in GNAT is still using the native AT&T‑style syntax, so it can be seen in CuBit too, when we need to use native instructions from Ada for specific hardware functionality, like reading I/O ports, or an x86 model-specific register, shown here.
In x86.adb
:
---------------------------------------------------------------------------
-- Read from a model-specific register (MSR)
---------------------------------------------------------------------------
function rdmsr(msraddr : in MSR) return Unsigned_64
is
low : Unsigned_32;
high : Unsigned_32;
begin
Asm("rdmsr",
Inputs => Unsigned_32'Asm_Input("c", msraddr),
Outputs => (Unsigned_32'Asm_Output("=a", low),
Unsigned_32'Asm_Output("=d", high)),
Volatile => True);
return (Shift_Left(Unsigned_64(high), 32) or Unsigned_64(low));
end rdmsr;
We aren’t allowed to use inline-assembly in SPARK-Mode, so we’ve marked the whole x86 package body with SPARK_Mode => Off
. However, we can still make use of SPARK to verify pre-conditions and post-conditions for these functions in the package specification. This is a technique that CuBit uses quite often, as we often have to do pointer arithmetic, unchecked conversions and what John Barnes calls “naughty peeking and poking” within a subprogram body.
Linking #
One interesting thing about CuBit and many operating systems is that the kernel is just an ELF object file, with the typical sections we’d expect in a normal application — like .text
, .rodata
, .data
, .bss
, etc. We need to be a little more careful about how these sections are linked (i.e. what address to use when referring to a symbol). CuBit uses a linker script to specify exactly where the sections should be linked, and then also provide a load address using the AT
keyword to tell the bootloader where in physical memory to load it.
Since CuBit is 64-bit, we link the kernel in the upper 2GB of memory, and use the -mcmodel=kernel
compiler flag to make sure the instructions generated by GNAT are using the appropriate addressing mode. With a multicore architecture, we consider the “bootstrap” core or BSP and then “application processors” or APs.
The APs will boot into real mode once we send them the appropriate interrupt, so the code there must be both physically in the first 64k of memory, as well as linked there. After some initial setup to get to protected mode, they boot in a similar manner to the BSP. We rename this section to .text_ap
to distinguish it from the normal kernel .text
section.
In linker.ld
:
/* AP starting point */
AP_START = 0x7000;
/* kernel load and link locations */
KERNEL_PHYS = 0x00100000;
KERNEL_BASE = 0xFFFFFFFF80000000;
...
SECTIONS
{
. = AP_START;
.text_ap : AT(AP_START) {
stext_ap = .;
*(.text_ap_entry)
etext_ap = .;
}
. = KERNEL_PHYS + KERNEL_BASE;
KERNEL_START_VIRT = .;
.text : AT(KERNEL_PHYS)
{
stext = .;
build/boot.o (.text .text.*) /* need this at the front */
*( EXCLUDE_FILE(build/init.o) .text .text.*)
}
. = ALIGN(4K);
etext = .;
.rodata :
{
srodata = .;
*(.rodata .rodata.*)
}
...
If you’ve never worked with linker scripts, this might look a little funny. Just remember that the .
means the “current address”. So starting at:
. = AP_START
we could say, “the current address is AP_START
(or 0x7000
).” Then, from that address, we define a section
.text_ap : AT(AP_START) {
stext_ap = .;
*(.text_ap_entry)
etext_ap = .;
}
This means link the text_ap
section starting from whatever the current address is, and then AT(AP_START)
puts a load address of AP_START
in the object file, which our bootloader will dutifully abide by. Note that there are what appear to be a couple of assignments: stext_ap = .;
and etext_ap = .;
. These symbols will get exported for use in our Ada code. However, we must always remember to take the address of a symbol, never refer to it directly. To ensure this, CuBit uses the Ada type system to help out:
In util.ads
:
---------------------------------------------------------------------------
-- Symbol is a useless type, used to prevent us from forgetting to use
-- 'Address when referring to one.
---------------------------------------------------------------------------
type Symbol is (USELESS) with Size => System.Word_Size;
In-between the stext_ap = .;
and etext_ap = .;
symbol exports, we have the cream filling of the .text_ap
section. Here’s where we specify what gets included in that section. In this case, it’s everything marked as a .text_ap_entry
section within individual object files — which in this case is just the compiled boot_ap.asm
or AP entry code. We use the section
directive in our assembly to specify what the object file section will be called:
In boot_ap.asm
:
...
BITS 16
; we'll link this section down low, since it has to be in first
; 65535 bytes for real mode.
section .text_ap_entry
...
If we take a look at the object file with readelf
we can see what’s going on:
> readelf -a build/boot_ap.o
...
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .shstrtab STRTAB 0000000000000000 00000320
000000000000009e 0000000000000000 0 0 0
[ 2] .strtab STRTAB 0000000000000000 000003c0
000000000000009d 0000000000000000 0 0 0
[ 3] .symtab SYMTAB 0000000000000000 00000460
0000000000000198 0000000000000018 2 15 8
[ 4] .text PROGBITS 0000000000000000 00000040
0000000000000000 0000000000000000 AX 0 0 16
[ 5] .text_ap_entry PROGBITS 0000000000000000 00000040
000000000000008e 0000000000000000 A 0 0 16
...
The .text_ap_entry
section shown here will get placed in the .text_ap
section we defined in our linker script. It seems yasm
gives us a .text
section we didn’t ask for, but it’s size 0, as expected, since boot_ap.asm
does not specify a .text
section.
We export some other useful symbols in the linker file too — like stext
, etext
, srodata
, erodata
, etc. This is just to mark the start and end of the various sections in our kernel image. We’ll return to those in just a bit.
The other sections, for read-only data (.rodata
), initialized static data (.data
) and uninitialized static data (.bss
) follow after the .text
section. Note we set the “current address” using the ALIGN(4K) directives. I like to have these page-aligned (4k when using small pages on x86-64) for a couple of reasons. The first, less-important reason is that it makes debugging easier when I can quickly identify the range of memory that might be causing an error.
The second, more-important reason is that we can apply per-page access protections. For instance, marking the “no-execute” (NX) bit for data sections, and marking .rodata
as read-only. In doing so, if we make a mistake and access something we shouldn’t, we’ll hopefully get a (easily debuggable) hardware exception rather than (hard to debug) undefined behavior. This is also a security enhancement.
This is performed in the Mem_Mgr
package, when we map all physical memory in the system into the kernel’s page tables with the appropriate flags. Here, Ada’s type system helps us out again, since we can define ranges with the section start and end addresses we exported in the linker script:
In mem_mgr.ads
:
...
subtype kernelTextPages is Virtmem.PFN range
Virtmem.addrToPFN(Virtmem.K2P(To_Integer(Virtmem.stext'Address))) ..
Virtmem.addrToPFN(Virtmem.K2P(To_Integer(Virtmem.etext'Address) - 1));
subtype kernelROPages is Virtmem.PFN range
Virtmem.addrToPFN(Virtmem.K2P(To_Integer(Virtmem.srodata'Address))) ..
Virtmem.addrToPFN(Virtmem.K2P(To_Integer(Virtmem.erodata'Address) - 1));
subtype kernelRWPages is Virtmem.PFN range
Virtmem.addrToPFN(Virtmem.K2P(To_Integer(Virtmem.sdata'Address))) ..
Virtmem.addrToPFN(Virtmem.K2P(To_Integer(Virtmem.ebss'Address) - 1));
...
This looks a bit unwieldy, but we’re just defining Ada ranges for each ELF section, converting them to their physical address, and then to a PFN or “page frame number” to make them easier to work with. Later, we’ll iterate through all the memory areas that the bootloader identifies, mapping pages appropriately. Ada makes it easy to check whether the page frame we’re looking at belongs to one of those ranges:
In mem_mgr.adb
:
procedure determineFlagsAndMapFrame(frame : in Virtmem.PFN) is
...
begin
if frame in kernelTextPages then
mapPage(fromPhys,
toVirtLinear,
Virtmem.PG_KERNELDATARO,
kernelP4,
ok);
if not ok then raise RemapException; end if;
mapPage(fromPhys,
toVirtKernel,
Virtmem.PG_KERNELCODE,
kernelP4,
ok);
...
Here, if the PFN falls into the kernelTextPages
range, then we’ll map it into virtual memory twice. Once as read-only data (PG_KERNELDATARO
) in what I call the “linear range”, at 0xFFFF_8000_0000_0000 and up, where we map all physical memory in the system, and then again as in the upper 2GiB of memory at 0xFFFF_FFFF_8000_0000 and up, where it was linked and executes from. At boot, CuBit will display the memory addresses of the sections as they are linked:
...
Kernel sections:
text: 0xFFFFFFFF80100000-0xFFFFFFFF80127000
rodata: 0xFFFFFFFF80127000-0xFFFFFFFF8012B000
data: 0xFFFFFFFF8012B000-0xFFFFFFFF8012C000
bss: 0xFFFFFFFF8012C000-0xFFFFFFFF804300A8
...
Modifying the Runtime #
SPARK comes with a “zero footprint runtime” (rts-zfp
) that you can find in the GNAT CE package. I’ve modified it very slightly from the original, so the CuBit repository has it’s own copy. Much of the work that I’ve put into CuBit is getting the operating system to a point where it can support language and library features that we’re all used to in our daily development, such as text I/O and memory allocation.
Allocators #
Since we have to provide all the memory allocation ourselves, CuBit has 2 physical memory allocators. The first is the BootAllocator
. This uses static bitmaps with support for allocating a limited amount of memory using a relatively slow linear search.
The other is the BuddyAllocator
. CuBit’s buddy allocator is implemented as an array of circularly-linked lists of free blocks of power-of‑2 multiples of the x86-64 small page size (4K). Whew, that was a mouthful!
I know what you’re thinking: “circularly-linked lists are prohibited in SPARK.” Well, yes. Here we get around that by overlaying the “free list record” on top of the underlying physical memory to be allocated. When removing the block from the free list (either because it was allocated, or because we’re combining it with its buddy to form a larger block), we call the unlink
procedure.
In buddyallocator.adb:
procedure unlink(ord : in Order; addr : in Virtmem.PhysAddress) with
SPARK_Mode => On,
Pre => freeLists(ord).numFreeBlocks > 0,
Post => freeLists(ord).numFreeBlocks =
freeLists(ord).numFreeBlocks'Old - 1
is
block : aliased FreeBlock with
Import, Address => To_Address(addr);
prevAddr : constant System.Address := block.prevBlock;
nextAddr : constant System.Address := block.nextBlock;
begin
linkNeighbors:
declare
prevBlock : aliased FreeBlock with
Import, Address => prevAddr;
nextBlock : aliased FreeBlock with
Import, Address => nextAddr;
begin
prevBlock.nextBlock := nextAddr;
nextBlock.prevBlock := prevAddr;
end linkNeighbors;
-- decrement the free list count when we unlink somebody
freeLists(ord).numFreeBlocks :=
freeLists(ord).numFreeBlocks - 1;
end unlink;
This makes it a bit harder to prove certain properties about the allocator, so might not be considered the best technique. I believe its an acceptable compromise here as it allows us to maintain SPARK_Mode in the subprogram body and allow proving other properties (absence of overflow, etc.) of the code.
CuBit enforces strict alignment of all block sizes, which causes us to sacrifice a little bit of memory to external fragmentation, but it makes the buddy calculations for splitting and rejoining blocks simpler and faster.
For object allocation, CuBit uses a curious feature of GNAT called the Simple_Storage_Pool
pragma. I couldn’t find much documentation about this, but was pleased to discover that we need little-to-no runtime support for this, but can still use the familiar new
keyword to allocate objects at runtime. This might open up the use of tagged records and controlled types in CuBit, but needs more experimentation before I can say one way or the other.
The SlabAllocator
implements a circularly-linked list of free objects. It works similarly to the BuddyAllocator
, but instead of page order-aligned blocks of memory, it works on objects of a fixed size, with user-specified alignment. By allocating from and free-ing to the head of the list, we can get fast O(1) allocations and deallocations. We do give up some flexibility with the slab allocator, since the object size is fixed and must be specified ahead of time.
In SlabAllocator.ads
:
type FreeNode is
record
next : System.Address;
prev : System.Address;
end record with Size => 16 * 8;
for FreeNode use
record
next at 0 range 0..63;
prev at 8 range 0..63;
end record;
type Slab is limited record
freeList : FreeNode;
numFree : Integer := 0;
capacity : Integer := 0;
blockOrder : BuddyAllocator.Order;
blockAddr : Virtmem.PhysAddress;
mutex : aliased Spinlock.Spinlock;
alignment : System.Storage_Elements.Storage_Count;
paddedSize : System.Storage_Elements.Storage_Count;
initialized : Boolean := False;
end record;
-- GNAT-specific pragma
pragma Simple_Storage_Pool_Type(Slab);
Use of this allocator should look somewhat familiar. We need to initialize the slab allocator with a setup routine that will allocate underlying physical memory from the buddy allocator described earlier.
...
objSlab : SlabAllocator.Slab;
type myObjPtr is access myObject;
for myObjPtr'Simple_Storage_Pool use objSlab;
procedure free is new Ada.Unchecked_Deallocation(myObject, myObjPtr);
obj : myObjPtr;
begin
SlabAllocator.setup(objSlab, myObject'Size);
obj := new Object;
...
Record Representation #
Note that I use both Size
and a record representation clause for the FreeNode
record in SlabAllocator
. This is overkill here, but generally I like to use both, especially for records that might be overlaid on memory-mapped hardware, used directly by the CPU (like page tables), or overlaid on in-memory BIOS structures like the ACPI tables. It helps keep me honest so that when I add fields to a record the compiler will ensure that the size and representation match up. Consider something like the ACPI FADT structure:
In acpi.ads
:
---------------------------------------------------------------------------
-- FADT - Fixed ACPI Description Table.
---------------------------------------------------------------------------
type FADTRecord is
record
header : SDTRecordHeader;
firmwareControl : Unsigned_32; -- ignored if exFirmwareControl present
dsdt : Unsigned_32; -- ignored if exDsdt present
reserved1 : Unsigned_8;
powerMgmtProfile : PowerManagementProfile;
sciInterrupt : Unsigned_16;
smiCommand : Unsigned_32;
acpiEnable : Unsigned_8;
acpiDisable : Unsigned_8;
S4BIOSReq : Unsigned_8;
pStateControl : Unsigned_8;
PM1AEventBlock : Unsigned_32;
PM1BEventBlock : Unsigned_32;
PM1AControlBlock : Unsigned_32;
PM1BControlBlock : Unsigned_32;
PM2ControlBlock : Unsigned_32;
PMTimerBlock : Unsigned_32;
GPE0Block : Unsigned_32;
GPE1Block : Unsigned_32;
PM1EventLength : Unsigned_8;
PM1ControlLength : Unsigned_8;
PM2ControlLength : Unsigned_8;
PMTimerLength : Unsigned_8;
GPE0BlockLength : Unsigned_8;
GPE1BlockLength : Unsigned_8;
GPE1Base : Unsigned_8;
cStateControl : Unsigned_8;
pLevel2Latency : Unsigned_16;
pLevel3Latency : Unsigned_16;
flushSize : Unsigned_16;
flushStride : Unsigned_16;
dutyOffset : Unsigned_8;
dutyWidth : Unsigned_8;
dayAlarm : Unsigned_8;
monthAlarm : Unsigned_8;
century : Unsigned_8; -- RTC index into RTC RAM if not 0
intelBootArch : Unsigned_16; -- IA-PC boot architecture flags
reserved2 : Unsigned_8; -- always 0
flags : Unsigned_32; -- fixed feature flags
resetRegister : GenericAddressStructure;
resetValue : Unsigned_8;
armBootArch : Unsigned_16;
fadtMinorVersion : Unsigned_8;
exFirmwareControl : Unsigned_64;
exDsdt : Unsigned_64;
exPM1AEventBlock : GenericAddressStructure;
exPM1BEventBlock : GenericAddressStructure;
exPM1AControlBlock : GenericAddressStructure;
exPM1BControlBlock : GenericAddressStructure;
exPM2ControlBlock : GenericAddressStructure;
exPMTimerBlock : GenericAddressStructure;
exGPE0Block : GenericAddressStructure;
exGPE1Block : GenericAddressStructure;
-- ACPI 6 fields (not supported yet)
--sleepControlReg : GenericAddressStructure;
--sleepStatusReg : GenericAddressStructure;
--hypervisorVendor : Unsigned_64;
end record with Size => 244*8;
for FADTRecord use
record
header at 0 range 0..287;
firmwareControl at 36 range 0..31;
dsdt at 40 range 0..31;
reserved1 at 44 range 0..7;
powerMgmtProfile at 45 range 0..7;
sciInterrupt at 46 range 0..15;
smiCommand at 48 range 0..31;
acpiEnable at 52 range 0..7;
acpiDisable at 53 range 0..7;
S4BIOSReq at 54 range 0..7;
pStateControl at 55 range 0..7;
PM1AEventBlock at 56 range 0..31;
PM1BEventBlock at 60 range 0..31;
PM1AControlBlock at 64 range 0..31;
PM1BControlBlock at 68 range 0..31;
PM2ControlBlock at 72 range 0..31;
PMTimerBlock at 76 range 0..31;
GPE0Block at 80 range 0..31;
GPE1Block at 84 range 0..31;
PM1EventLength at 88 range 0..7;
PM1ControlLength at 89 range 0..7;
PM2ControlLength at 90 range 0..7;
PMTimerLength at 91 range 0..7;
GPE0BlockLength at 92 range 0..7;
GPE1BlockLength at 93 range 0..7;
GPE1Base at 94 range 0..7;
cStateControl at 95 range 0..7;
pLevel2Latency at 96 range 0..15;
pLevel3Latency at 98 range 0..15;
flushSize at 100 range 0..15;
flushStride at 102 range 0..15;
dutyOffset at 104 range 0..7;
dutyWidth at 105 range 0..7;
dayAlarm at 106 range 0..7;
monthAlarm at 107 range 0..7;
century at 108 range 0..7;
intelBootArch at 109 range 0..15;
reserved2 at 111 range 0..7;
flags at 112 range 0..31;
resetRegister at 116 range 0..95;
resetValue at 128 range 0..7;
armBootArch at 129 range 0..15;
fadtMinorVersion at 131 range 0..7;
exFirmwareControl at 132 range 0..63;
exDsdt at 140 range 0..63;
exPM1AEventBlock at 148 range 0..95;
exPM1BEventBlock at 160 range 0..95;
exPM1AControlBlock at 172 range 0..95;
exPM1BControlBlock at 184 range 0..95;
exPM2ControlBlock at 196 range 0..95;
exPMTimerBlock at 208 range 0..95;
exGPE0Block at 220 range 0..95;
exGPE1Block at 232 range 0..95;
-- ACPI 6 fields
--sleepControlReg at 244 range 0..95;
--sleepStatusReg at 256 range 0..95;
--hypervisorVendor at 268 range 0..63;
end record;
It’s very easy to make mistakes in big records like this! Thankfully Ada’s record representation clauses allow us to depend on the field alignments and sizes that we specify.
Per-CPU Data, Per-CPU Stacks, and Secondary Stacks #
Since CuBit is a multi-core operating system, we need to maintain separate call stacks for each CPU. The mechanism by which this is done is fairly primitive at this point. We just carve out a chunk of memory (8 MiB as of the time of writing, but I suspect we’ll eventually want more) for the kernel stacks (STACK_BOTTOM
to STACK_TOP
). This is divided up by the maximum number of cores that CuBit is currently set to support (128), so we get a 64k-sized stack for each CPU present.
These are kernel-mode stacks, so call chains should be fairly short. Each of these stacks is also divided up into the primary stack, which grows downward, and then the Ada secondary stack, which grows upward. The secondary stack is not widely used, but it does work. It’s set at a paltry 2048 bytes for now, in the System.Parameters.Runtime_Default_Sec_Stack_Size
runtime package.
We need to set the stack pointer early on, so that we can call adainit
to perform elaboration, and then our kmain
function.
In boot.asm:
...
setup_bsp:
; Setup our kernel stack.
mov rsp, qword (STACK_TOP)
; Add a stack canary to bottom of primary stack for CPU #0
mov rax, qword (STACK_TOP - PER_CPU_STACK_SIZE + SEC_STACK_SIZE)
mov rbx, 0xBAD_CA11_D37EC7ED
mov [rax], rbx
; Save rdi, rsi so adainit doesn't clobber them
push rdi
push rsi
; Initialize with adainit for elaboration prior to entering Ada.
mov rax, qword adainit
call rax
; Restore arguments to kmain
pop rsi
pop rdi
; call into Ada code
mov rax, qword kmain
call rax
...
The secondary stack is then initialized with a call to System.Secondary_Stack.SS_Init
with the address for the bottom of the CPU stack for the bootstrap processor/CPU #0.
Remember those APs that boot into real-mode? We need to set their initial stack pointers as well, once they get to 64-bit mode. When we tell those processors to boot, we set a global variable startingCPU
in the Ada code prior to sending the interrupt that tells them to start.
We can use that to calculate where that CPU’s per-CPU stack should live in the setup_ap:
. Each stack size is (STACK_TOP - STACK_BOTTOM) / MAX_CPUS
. Then we can set the stack for CPU N by setting its RSP (stack pointer) to STACK_TOP - (startingCPU * stack size)
.
This raises another issue. Multi-core code can use the primary stack easily, since each CPU has its own stack pointer register. However, the Ada secondary stack needs to be manipulated using data and a pointer kept in main memory. So how do we know which secondary stack to use when we’re running code on an arbitrary CPU? It is not always obvious which CPU a piece of code is running on in a multi-core system.
CuBit uses a per-CPU data structure. It is implemented similar to the way that thread-local storage is, using the GS register to store an offset to the per-CPU data. This data structure includes information like the CPU number it represents and the currently running process ID. When code on a CPU attempts to use the secondary stack, it gets a pointer to the dedicated secondary stack for that CPU. This is obviously slower than accessing the RSP register for the primary stack.
In this case, CuBit actually keeps the per-CPU data on each of the primary stacks we set up earlier. Generally speaking its bad form to take the address of a stack variable, but here we create the per-CPU data record just prior to entering the scheduling subprogram which loops infinitely, so it is safe to access as though it were statically allocated.
When the secondary stack package goes to allocate, it calls __gnat_get_secondary_stack
, which is the export name of the getSecondaryStack
function in the PerCPUData
package. We use this address in SS_Allocate
, which is called behind the scenes whenever the secondary stack is used.
Scheduling and Tasking #
The CuBit scheduler is very primitive and very simple. It just uses a basic round-robin model for now, with a lot of work to be done for various process states like sleeping, etc. This is currently where I’m spending a lot of my time on CuBit. Context switching is theoretically just a matter of saving one register set and restoring another, and returning to the saved return address on the stack. x86 makes this a bit more challenging, and the details are messy. I’m still working through a lot of the finer points, but it mostly works. As of the time of writing, there’s a small user-mode stub in init.asm
. This is the first user-mode process, which will eventually perform the system call to execute whatever init framework is used.
SPARK can verify concurrent programs using the Ravenscar tasking model. This is ideal for bare-metal applications where the tasks are known ahead of time. However, in a general-purpose OS, we don’t know what applications are going to be run, nor should we place unnecessary restrictions on them. There might be a good way to model user processes as Ada tasks in order to get some guarantees from the compiler, but I haven’t worked out a way to do this, and it may in fact not be feasible.
The difficulty, as far as I can tell, lies in describing which parts of the code might be running concurrently and modeling their interactions. Between multiple cores, system calls, and interrupts, it gets hairy very quickly. Careful use of locking primitives is the typical route for dealing with this, but ensuring the absence of deadlock is difficult. If there are any SPARK experts reading this with some good ideas in this area, please let me know!
The Future #
CuBit is still in a very early stage of development. It’s just been a side-project of mine for now, and frankly I’m a little embarrassed to write about it in it’s current state! I suppose they say “if you aren’t embarrassed by your first release than you’ve waited too long to release”. I welcome contributions from the wider Ada/SPARK community. If, like me, you dream about formally-verified software taking over the world some day, then please check it out at https://github.com/docandrew/CuBit/ and start submitting those pull requests!