AdaCore Blog

Let’s Write a Safety Monitor for a Mars Rover!

Let’s Write a Safety Monitor for a Mars Rover!

by Anthony Aiello

Recently, we developed a new demo platform: the Ada Mars Rover. You can find the sources for this project here.

One of the goals of building the Mars Rover is to demonstrate SPARK. As an early step, Fabien introduced a new hardware abstraction layer (HAL) on top of the existing HALs that had been developed for the various pieces of hardware that make up the Mars Rover. The result is a façade that provides a simplified and unified view of the Mars Rover hardware, a single point at which the hardware can be replaced by a simulator, and an opportunity to introduce the necessary SPARK contracts to enable proof for the rover software. Once Fabien had completed his work on the HAL, the entirety of the Rover software proved at SPARK silver - that is, SPARK proved the absence of run-time exceptions (AoRTE).

I wanted to take this a step further: I wanted to introduce a gold-level property - a functional property about the software that we could imagine was traced to the Mars Rover requirements. A fairly obvious property suggests itself when considering the Rover: it shouldn’t crash into obstacles. So, I formalized this property and proved it. Let me show you how.

Formalizing the Safety Property

It’s perhaps a bit cheeky to suggest that we’ll prove that the Rover won’t crash into obstacles. Many things can go wrong over which the software has no control: an obstacle could roll into the path of the Rover at the last minute; a hardware fault could leave both motors stuck at full power; the sonar could fail to detect an obstacle; etc. So when we say “the Rover cannot crash”, what we really mean is that the Rover won’t attempt to go forward when it knows there’s an obstacle.

Here’s how we might write our safety requirement in natural language:

The Mars Rover shall not proceed straight forward (i.e., directly forward, without turning) when the distance to an obstacle is less than the Safety Threshold.

This requirement isn’t perfect, but it’s good enough for us to move forward (so to speak!). Let’s take a look at how we might formalize it.

This kind of safety requirement naturally maps to the idea of a safety monitor: some piece of functionality that will consider the inputs to and potentially internal states of the controller and then check to see if the outputs of the controller are acceptable. For our requirement, we need to consider the input that represents the distance to an obstacle and the outputs that indicate the rover is proceeding straight forward, i.e., both motors with a positive power level and the turn set to straight.

Looking at the HAL (rover_hal.ads), we see that the Rover is expected to poll the sonar for distance:

function Sonar_Distance return Unsigned_32 with
  Pre    => Initialized,
  Global => (HW_State, HW_Init),
  Side_Effects,
  Volatile_Function;

We can also see that the Rover is expected to set the turn mode and the motor power for each wheel:

procedure Set_Turn (Turn : Turn_Kind) with
  Pre  => Initialized,
  Post => Initialized,
  Global => (HW_State, HW_Init);
procedure Set_Power (Side : Side_Id;
                     Pwr  : Motor_Power)
with
  Pre  => Initialized,
  Post => Initialized,
  Global => (HW_State, HW_Init);

Given this information, we formalize our requirement like this:

if Last_Sonar_Distance < Safety_Threshold and then
   Last_Turn = Straight
then
   Last_Power (Left)  <= 0 and then
   Last_Power (Right) <= 0

where the Last_ prefixes indicate some means to refer to whatever the last value that was retrieved (in the case of the sonar) or set (in the case of the turn and motor powers)1.

We can validate that these are the right kinds of properties to check by looking at how the controller modes for the Rover are implemented. In rover-remote_controlled.adb, for example, we see sets of calls to the HAL like this:

when Forward =>
  Set_Turn (Straight);
  Set_Power (Left, 100);
  Set_Power (Right, 100);

So we seem to be on the right track in our formalization of the safety requirement.

Providing Getters

If the HAL had getters, we could simply use them to write our formalized safety requirement. But the HAL doesn’t have getters - only setters. We will have to provide getters so that we can use them in our formalized requirement. SPARK lets us do this without forcing us to implement the getters, as we will see2.

The Turn Getter

SPARK gives us the ability to model the behavior of the getters using uninterpreted functions and abstract state.

An uninterpreted function is a Ghost function that we declare but for which we never provide an implementation. Abstract state tells SPARK that the getter is related to something that may change as a result of calling the setter. Here’s how we introduce them.

First, we introduce our abstract state:

Abstract_State => (HW_Init,
                   (HW_State with Synchronous),
                   (Turn_State with Ghost))

Then, we declare the getter function like this:

function Get_Turn return Turn_Kind with
  Pre    => Initialized,
  Global => (HW_Init, Turn_State),
  Ghost,
  Import;

This gives us a means to talk about the last turn value that was set. Now, we need to tie this model of a getter to the setter, which we do by updating the setter like this:

procedure Set_Turn (Turn : Turn_Kind) with
  Pre  => Initialized,
  Post => Initialized and then
          Get_Turn = Turn,
  Global => (Input  => (HW_State, HW_Init),
             In_Out => Turn_State);

We have used the abstract state to tell SPARK through the Global annotation that there’s something changed by the setter that the getter needs to know about. And in the postcondition of Set_Turn, we have declared that when we call Get_Turn, the value returned will equal the value given in the formal parameter Turn. We have declared that Get_Turn behaves like a getter should.

The Power Getter

We might assume that we can write the getter for motor power in exactly the same way, like this:

procedure Set_Power (Side : Side_Id;
                       Pwr  : Motor_Power)
with
  Pre  => Initialized,
  Post => Initialized and then
          Get_Power (Side) = Pwr,
  Global => (Input  => (HW_State, HW_Init),
             In_Out => Power_State);


function Get_Power (Side : Side_Id) return Motor_Power with
  Pre    => Initialized,
  Global => (HW_Init, Power_State),
  Ghost,
  Import;

Unfortunately, we then run into a problem when attempting to prove the following:

Set_Power (Left, 0);
pragma Assert (Get_Power (Left) = 0);

Set_Power (Right, 0);
pragma Assert (Get_Power (Left)  = 0);
pragma Assert (Get_Power (Right) = 0);

We can prove the first and third assertions, but not the second. What’s going on?

Looking closely at Set_Power, we see that we’ve told SPARK how Get_Power will behave when called with Side specified - but not that Get_Power will continue to return the same value as before when called with any other Side specified. That this should be true is obvious to us - but without having stated it, SPARK will not assume it. (This is the safe approach; SPARK’s not doing anything wrong here.)

We need to adjust Set_Power so that it will maintain that relationship for us. Since there are only two sides, the simplest approach is to do this:

procedure Set_Power (Side : Side_Id;
                       Pwr  : Motor_Power)
with
  Pre  => Initialized,
  Post => Initialized and then
          Get_Power (Side) = Pwr and then
          (if Side = Left then
             Get_Power (Right) = Get_Power (Right)'Old
           else
             Get_Power (Left) = Get_Power (Left)'Old),
  Global => (Input  => (HW_State, HW_Init),
             In_Out => Power_State);

The Distance Getter

For completeness, here’s the getter for Sonar_Distance - there’s nothing surprising going on here:

function Sonar_Distance return Unsigned_32 with
  Pre    => Initialized,
  Post   => Get_Sonar_Distance = Sonar_Distance'Result,
  Global => (Input => (HW_State, HW_Init),
             In_Out => Distance_State),
  Side_Effects,
  Volatile_Function;

function Get_Sonar_Distance return Unsigned_32 with
  Pre    => Initialized,
  Global => (HW_Init, Distance_State),
  Ghost,
  Import;

The Safety Property

With our getters in place, we add the safety property to rover.ads - this makes it available anywhere in the Rover package hierarchy:

function Cannot_Crash return Boolean is
  (if Rover_HAL.Get_Sonar_Distance < Safety_Distance and then
      Rover_HAL.Get_Turn = Rover_HAL.Straight
   then
     Rover_HAL.Get_Power (Rover_HAL.Left)  <= 0 and then
     Rover_HAL.Get_Power (Rover_HAL.Right) <= 0);

We have set the safety distance to 20:

Safety_Distance : constant Interfaces.Unsigned_32 := 20;

This matches the design of the remote-control mode.

Proving Controller Safety

We then attach the monitor to our controllers. We’ll do this top-down. Often, when we add contracts to SPARK code for proof, we do this bottom-up - but in this case, it’s easier to think about what’s going on and decide where we need the contracts if we start at the top: Rover.Tasks.Demo.

Although not strictly necessary (since the demo loop never terminates), we’ll add a postcondition in the rover-tasks.ads file like this:

procedure Demo with
  Pre  => Rover_HAL.Initialized,
  Post => Rover_HAL.Initialized and then
          Rover.Cannot_Crash;

This way, if we ever change how Demo behaves and decide to use it in another context, we’re promising that it will leave the Rover in a safe state.

Looking at the body, we see a single loop that executes the autonomous-control mode followed by the remote-control mode, forever. We add the Loop Invariant like this:

loop
  Rover.Autonomous.Run;
  Rover.Remote_Controlled.Run;

  pragma Loop_Invariant (Rover.Cannot_Crash);
end loop;

Of course, this doesn’t prove yet: we need to add Rover.Cannot_Crash as a postcondition of Rover.Remote_Controlled.Run, first. And since we’d like this property to be invariant, we’ll also make it a postcondition of Rover.Autonomous.Run.

Let’s start with the autonomous-control mode, as it’s actually simpler to prove than the remote-control mode!

Autonomous Control Mode

We add the postcondition to rover-autonomous.ads like this:

procedure Run with
  Pre => Rover_HAL.Initialized,
  Post => Rover_HAL.Initialized and then
          Rover.Cannot_Crash;

We don’t need a precondition asserting the safety property because the first thing Run does is stop everything:

--  Stop everything
Set_Turn (Straight);
Set_Power (Left, 0);
Set_Power (Right, 0);

We then add a Loop Invariant to the body of Run:

while not State.User_Exit loop
  Go_Forward (State);
  Find_New_Direction (State);

  pragma Loop_Invariant (Rover.Cannot_Crash);
end loop;

Again, this won’t prove until we add the necessary postcondition to Find_New_Direction. Since Find_New_Direction is local to the body, we add it like this:

procedure Find_New_Direction (This : in out Auto_State) with
  Pre  => Initialized,
  Post => Initialized and then
          Rover.Cannot_Crash

SPARK proves that the postcondition is guaranteed by the body of Find_New_Direction without any additional proof annotations at proof Level 1. And SPARK proves the loop in Run and the postcondition for Run at this point.

Before we move on, however, we should consider that, since our safety property is meant to be invariant (never violated), we should probably also consider what’s going on in Go_Forward. Go_Forward starts by unconditionally commanding the robot to move forward and then enters a loop looking for obstacles. Since it starts by moving forward, we should ensure that we don’t move forward unless the safety property is met. We’ll make this a precondition of Go_Forward.

procedure Go_Forward (This : in out Auto_State) with
  Pre  => Initialized and then
          Rover.Cannot_Crash,
  Post => Initialized

Why didn’t we also make the safety property a postcondition? Let’s take a look at the loop in Go_Forward

loop
  Check_User_Input (This);
  exit when This.User_Exit;

  Next_Mast_Pos (This, -55, 55, Milliseconds (10));
  Distance := Sonar_Distance;
  exit when Distance < 30;

  Delay_Milliseconds (10);
end loop;

There are two conditions under which we exit this loop: (1) if the user presses a button; and (2) if the distance to an obstacle is less than 30 (which is more conservative than our safety threshold). Unfortunately, neither of these conditions on its own is enough to say that we are safe. For (1), the Rover could be too close to an obstacle (and still be commanded to move straight forward). For (2), we know the upper bound on the distance, but not the lower bound - we’d have to know that the distance is, e.g., greater than or equal to 20 but less than 30. Vehicle dynamics and the update rate of the sonar may guarantee this, but we don’t have this information available simply from the code. Thus if we tried to prove the safety property as a Loop Invariant and then as a postcondition, we would not succeed3.

Arguably, it’s okay that we don’t establish the safety property as an invariant or postcondition here. We believe we’re safe because of the second exit condition and Find_New_Direction doesn’t assume the safety property to do its job.

With the contracts on Go_Forward and Find_New_Direction in place, SPARK proves everything in Rover.Autonomous.

Note: on the way out, the autonomous-control mode stops everything:

--  Stop everything before leaving the autonomous mode
Set_Turn (Straight);
Set_Power (Left, 0);
Set_Power (Right, 0);

This will become important in a moment!

Remote Control Mode

We add both a precondition and postcondition to rover-remote_controlled.ads like this:

procedure Run with
  Pre  => Rover_HAL.Initialized and then
          Rover.Cannot_Crash,
  Post => Rover_HAL.Initialized and then
          Rover.Cannot_Crash;

The precondition gives us some confidence that the Rover isn’t already in trouble (although we’ll see in a moment that it’s not as strong as we might like, given how the remote-control mode was originally written!).

As above, we add our safety property in a Loop Invariant at the bottom of the loop in the body of Run.

      [...]

      Delay_Milliseconds (30);

      pragma Loop_Invariant (Rover.Cannot_Crash);
   end loop;
end Run;

And … SPARK can’t prove this - at any proof Level:

rover-remote_controlled.adb:144:33: medium: loop invariant might fail in first iteration, cannot prove Rover_HAL.Get_Power (Rover_HAL.Left) <= 0
  144 |         pragma Loop_Invariant (Rover.Cannot_Crash);
      |                                ^~~~~~~~~~~~~~~~~~
  in inlined expression function body at rover.ads:27


rover-remote_controlled.adb:144:33: medium: loop invariant might not be preserved by an arbitrary iteration, cannot prove Rover_HAL.Get_Power (Rover_HAL.Left) <= 0
  144 |         pragma Loop_Invariant (Rover.Cannot_Crash);
      |                                ^~~~~~~~~~~~~~~~~~
  in inlined expression function body at rover.ads:27

When proof failures like this show up, a good approach is to talk through why we believe the property should hold. Let’s take a look at the loop again:

[...]
Dist := Sonar_Distance;

if Dist < Rover.Safety_Distance then
  --  Ignore forward commands when close to an obstacle
  Buttons (Up) := False;
end if;

Cmd := To_Command (Buttons);

if Cmd /= Last_Cmd then
  case Cmd is
  when None =>
     Set_Power (Left, 0);
     Set_Power (Right, 0);
  when Forward =>
     Set_Turn (Straight);
     Set_Power (Left, 100);
     Set_Power (Right, 100);
[...]

So we read the distance, check it against the safety threshold, and release the Up button if we’re too close to an obstacle. Then we convert our buttons to a command and set the turn and motor power accordingly. Presumably, releasing the Up button means that we won’t go forward. This information is contained in the function To_Command.

But SPARK’s analysis is modular: it only knows about the current subprogram (and its nested subprograms, if any) and expression functions. When we look at To_Command, we see that it’s neither a nested function nor an expression function. And because it has no contract, SPARK has no idea what it’s doing and assumes that the result could be any valid value of the Remote_Command type. In essence, we’ve hidden critical information from SPARK!

So let’s turn this into an inlined expression function so that SPARK can see what’s happening:

function To_Command (Buttons : Buttons_State) 
                     return Remote_Command
is
   (if Buttons (Up) and then Buttons (Left) then
      Forward_Left
   elsif Buttons (Up) and then Buttons (Right) then
      Forward_Right
   elsif Buttons (Down) and then Buttons (Left) then
      Back_Left
   elsif Buttons (Down) and then Buttons (Right) then
      Back_Right
   elsif Buttons (Up)  then
      Forward
   elsif Buttons (Right) then
      Turn_Right
   elsif Buttons (Left) then
      Turn_Left
   elsif Buttons (Down) then
      Backward
   else
      None)
with
   Annotate => (GNATprove, Inline_for_Proof);

Now we might expect SPARK has the information it needs to prove our property. But when we try again, SPARK still responds that Loop Invariant may not hold.

The structure of the loop is fairly straightforward: all the information SPARK needs appears to be there - provided, anyway, that the new command Cmd is not equal to the old command Last_Cmd. If the two commands are the same, however, the Rover continues to operate based on the turn and motor power levels previously set.

In general, SPARK forgets everything that happened in the previous loop iteration when considering the current loop iteration. Only information that is asserted via Loop Invariants is carried forward into the next iteration. So SPARK essentially doesn’t know what our getters will return, unless Cmd /= Last_Cmd. What should we tell SPARK it needs to remember?

Looking at the case statement and recalling our safety property, we’d like to remember the following:

--  Carry the information about the relationship between Cmd and the
--  turn and power settings around the loop, since when Cmd = Last_Cmd
--  these values are retained.
pragma Loop_Invariant (if Cmd /= Forward then
                         Get_Turn /= Straight or
                         (Get_Power (Left) <= 0 and
                          Get_Power (Right) <= 0));

The intuition is that because we don’t move forward when the safety threshold is violated, we only need to make sure that, when the safety threshold is violated, we’re not going straight with positive motor values. It’s fine if we’re turning or if we’re going straight but backward.

With this Loop Invariant, SPARK proves the safety property - so this Loop Invariant is indeed useful. Unfortunately, it doesn’t prove: SPARK tells us that it may fail in the first iteration. (But SPARK does prove that it holds in subsequent iterations, which leads us to believe that it’s probably the right Loop Invariant.) Why not?

A useful approach to debugging a proof like this is to consider the relationship between the invariant and the structure of the code. Here, as we said above, there are two cases of interest: Cmd = Last_Cmd and Cmd /= Last_Cmd. We can essentially split the Loop Invariant, expressing these two cases, like this

pragma Loop_Invariant (if Cmd = Last_Cmd and Cmd /= Forward then
                         Get_Turn /= Straight or
                         (Get_Power (Left) <= 0 and
                          Get_Power (Right) <= 0));

pragma Loop_Invariant (if Cmd /= Last_Cmd and Cmd /= Forward then
                         Get_Turn /= Straight or
                         (Get_Power (Left) <= 0 and
                          Get_Power (Right) <= 0));

Now, SPARK tells us that the first Loop Invariant - where Cmd = Last_Cmd - fails on the first iteration, which means we can ignore the body of the if statement containing the case statement with the Rover commands entirely in our debugging. Let’s talk through how the Run procedure works again, with this knowledge in mind, and focus on the first iteration of the loop.

The Run procedure initializes Cmd to None. When the loop starts, the buttons on the remote control are polled, and then the value of Cmd is copied to Last_Cmd - so now Cmd and Last_Cmd are both None. The safety threshold is then checked. If the Rover is too close to an obstacle, the Up button is released. The buttons are then converted to a command, which is assigned to Cmd.

It’s possible that none of the buttons are pressed: either because the user isn’t pressing a button or because the user was pressing the Up button but the Rover was too close to an obstacle. So it’s possible that Cmd will (again) be assigned None. Thus it’s possible that Cmd = Last_Cmd. And in that case, the Rover will continue doing whatever it was doing before.

From the information available in this procedure, there’s no reason to assume that the Rover wasn’t previously driving straight forward. And even though we checked the safety threshold, because Cmd = Last_Cmd, we’re not going to change the Rover’s turn or motor powers. So our safety property could indeed be violated in the first iteration. SPARK is correct.

Have we found a bug that could have resulted in the Rover crashing? No - not insofar as the current Rover demo is concerned. Recall that, on the way out of autonomous-control mode, everything is stopped, as I noted above. This guarantees that when Cmd = Last_Cmd in the first iteration, the Rover will (continue to) be stopped - and our safety property isn’t violated.

But SPARK doesn’t know this. And the importance of stopping the Rover when handing off to the remote-control mode isn’t documented anywhere. So if this module were reused in a different context, this unwritten assumption could be violated, leading to the possibility of an accident. And if this sounds farfetched, there are high-profile examples of exactly this happening in safety-critical systems. So let’s fix this.

Fixing the Remote Control Mode

There are a couple of ways we could fix the problem with the remote-control mode.

We might introduce a precondition that requires the Rover be stopped - or in some similarly safe state - before the remote-control mode is called. We could this using our getters, like this:

procedure Run with
  Pre  => Rover_HAL.Initialized and then
          Rover.Cannot_Crash and then
          Rover_HAL.Get_Power (Rover_HAL.Left) = 0 and then
          Rover_HAL.Get_Power (Rover_HAL.Right) = 0,
  Post => Rover_HAL.Initialized and then
          Rover.Cannot_Crash;

With this change, SPARK fully proves Rover.Remote_Controlled at Level 1. (We would have to add these constraints to the postcondition of Rover.Autonomous.Run to prove the demo fully.)

It does seem like a pity that the Rover has to be stopped to activate the remote-control mode, however. We might imagine an enhanced version of the demo that allows a user to seamlessly take control of the Rover, by pressing the button on the remote control corresponding to the Rover’s current motion.

Instead, we can introduce a new Rover_Command that represents the initial, invalid state of Last_Cmd, to better handle the first loop iteration. We add the new command like this:

type Remote_Command is (Invalid,
                        None,
                        Forward,
                        Backward,
                        Turn_Left,
                        Turn_Right,
                        Forward_Left,
                        Forward_Right,
                        Back_Left,
                        Back_Right);

We then update the case statement to account for this new command, by removing the case for None and adding a when others at the bottom:

when others =>
  Set_Power (Left, 0);
  Set_Power (Right, 0);
end case;

These changes ensure that, in the first iteration, Cmd /= Last_Cmd, because To_Command can never generate this Invalid command. In this way, we preserve greater flexibility in how the remote-control mode can be activated while still preserving the safety property. With this change, SPARK fully proves the demo at Level 1.

Conclusion

We’ve illustrated how a safety requirement for a simple Mars rover can be formalized and used to prove that the rover controller can never command the rover to crash. We implemented the formalism using Ghost state and uninterpreted functions that model getters without changing the executable semantics of the rover HAL in any way. We attached the model to our code and found an unstated assumption in the remote-control mode that could have allowed the Rover to crash into an obstacle, were that mode reused as a component in a new system - so we fixed the remote-control mode to ensure safety regardless of how the remote-control mode is called. Finally, we showed that our model is proved by SPARK at Level 1 with minimal need for intermediate assertions.


[1] Sonar_Distance is already a “getter”, in the sense that it retrieves a distance value from the sonar - so you might wonder why we don’t just refer to Sonar_Distance directly. We could do this - but the specification of this function reveals that it is volatile and has side effects: calling it will change the hardware and software state somehow. This may still be okay - but we can’t really know this for certain. And it seems better for our safety monitor to never affect the state of the system. So we’ll look for a means to refer to the last-returned value, instead.

[2] There are other solutions that we could pursue here, but this is by far the most elegant, as it is DRY and avoids YAGNI (we haven’t needed the getters before, so we probably won’t for non-proof purposes).

[3] This is part of the reason we don’t use a Type_Invariant for the safety property. Since a Type_Invariant is asserted at subprogram boundaries, it’s both too restrictive (sometimes we want to temporarily break the invariant, as we see here) and not restrictive enough (we want to assert the invariant in critical Run loops). Furthermore, a Type_Invariant must be attached to a type (as the name suggests). The invariant is asserted over instances of the type at the boundaries of subprograms. For the Mars Rover, we don’t have such an instance that we pass around - the state is global (and internal to the Rover_HAL package). We’d have to do significant refactoring to use a Type_Invariant.

Posted in #SPARK   

About Anthony Aiello

Anthony Aiello

Tony Aiello is SPARK Pro Product Manager, RecordFlux Product Manager, and GNAT Pro for Rust Product Manager at AdaCore.