
Revisiting the Mars Rover Safety Monitor
by Anthony Aiello –
Since writing my blog post “Let’s Write a Safety Monitor for a Mars Rover!”, I’ve been giving presentations and a webinar on SPARK and using my work to help illustrate what it’s like writing and proving correctness properties with SPARK. This forced me to look again carefully at what I was proving — there’s nothing like presenting some code in front of an audience to focus attention on what you’re doing! I found I was increasingly unhappy with this loop, in Autonomous_Mode.Run
:
while not State.User_Exit loop
Go_Forward (State);
Find_New_Direction (State);
pragma Loop_Invariant (Rover.Cannot_Crash);
end loop;
The loop invariant does prove, so we know that the Rover isn’t in an unsafe state at that point — but I didn’t like my failure to assert the safety property after Go_Forward
(Rover.Cannot_Crash
isn’t a postcondition of Go_Forward
, here). I proved that, after finding a new direction, the Rover would be in a safe state — but I didn’t prove that after going forward the Rover would be in a safe state.
I addressed this in the original blog post by making an informal argument about the vehicle dynamics and its impact on the correctness of the code:
There are two conditions under which we exit [the
Go_Forward
] loop: (1) if the user presses a button; and (2) if the distance to an obstacle is less than 30 units (which is more conservative than our safety threshold). Unfortunately, neither of these conditions on its own is enough to say that we are safe. For (1), the Rover could be too close to an obstacle (and still be commanded to move straight forward). For (2), we know the upper bound on the distance, but not the lower bound — we’d have to know that the distance is, e.g., greater than or equal to 20 but less than 30. Vehicle dynamics and the update rate of the sonar may guarantee this, but we don’t have this information available simply from the code.
Now, this is a valid approach to dealing with code that depends on real-world assumptions that cannot be checked in software. But it felt unsatisfying for this demo, which I wanted to be more complete and more self-contained.
In this post, I’ll update the Rover demo to formalize the informal argument I made in the prior post. The result is a more satisfying autonomous mode that does a better job of guaranteeing that the Rover cannot crash. You can find the full Mars Rover demo — including the enhancements I describe here — at the Mars Rover demo’s home on github: https://github.com/adaCore/mars-rover-demo.
Proving Go_Forward
#
Here’s the complete Go_Forward
subprogram from before (with a couple of the magic numbers replaced by constants):
Distance_Threshold : constant := 40;
Sonar_Sampling_Delay : constant := 40;
procedure Go_Forward (This : in out Auto_State) with
Pre => Initialized and then
Rover.Cannot_Crash,
Post => Initialized
is
Distance : Unsigned_32;
begin
-- Go forward...
Set_Turn (Straight);
Set_Power (Left, 100);
Set_Power (Right, 100);
-- Rotate the mast and check for obstacle
loop
Check_User_Input (This);
exit when This.User_Exit;
Mast_Control.Next_Mast_Angle (This.Mast, -60, 70, 16);
Distance := Sonar_Distance;
exit when Distance < Distance_Threshold;
Delay_Milliseconds (Sonar_Sampling_Delay);
end loop;
end Go_Forward;
We’d really like to prove that the Rover is safe during the loop and when the subprogram returns. So we should add Rover.Cannot_Crash
to the postcondition and as a loop invariant. We’ll put the loop invariant after the second exit statement, like this:
exit when Distance < Distance_Threshold;
pragma Loop_Invariant (Rover.Cannot_Crash);
Delay_Milliseconds (Sonar_Sampling_Delay);
end loop;
SPARK will prove the loop invariant: at that point in the loop, we are assured that Distance is greater than 40, which is greater than our safety threshold of 20. But SPARK cannot prove the postcondition: we have exited when we fall below the distance threshold of 40 but, as noted in the original blog post, we have no way of knowing that we’re above the safety threshold of 20. The Rover could be moving fast enough that the distance to the obstacle fell from, e.g., 41 to 19 in one loop iteration. We have to bring in additional information.
Unfortunately, the additional information we need doesn’t exist anywhere in the software. Instead, this additional information is a physical property of the system: the Rover just isn’t capable of moving fast enough on its own that it could close the difference between the distance and safety thresholds in 40 ms. The design of software depends on this fact, which we need to communicate to SPARK.
One option is to use an assumption. An assumption — unlike an assertion — will not be proved by SPARK. SPARK will simply use the assumption to help prove subsequent checks. We can add the assumption like this:
pragma Assume (Distance >= Rover.Safety_Distance);
exit when Distance < Distance_Threshold;
pragma Loop_Invariant (Rover.Cannot_Crash);
Delay_Milliseconds (Sonar_Sampling_Delay);
end loop;
But … this still doesn’t prove! Why not?
There are two exit statements in our loop. We’re confident we’ve addressed the postcondition at the second exit statement, because of our new assumption. But we’ve not addressed the first exit statement. (We can verify our thinking by asserting Rover.Cannot_Crash
just before the first exit statement; SPARK will fail to prove this assertion.)
To do this, we need to strengthen the precondition on Go_Forward
. We have said that the Rover must be in a safe state, but Rover.Cannot_Crash
is too weak to enable us to prove Rover.Cannot_Crash
as our postcondition at the first exit, the first time around the loop. Why? Because Rover.Cannot_Crash
is true if the sonar distance is greater than the safety threshold OR we are not moving straight forward. So the precondition holds if, e.g., we are moving backwards and the sonar distance is 5. But then the first thing Go_Forward
does is start moving forward. So if we took the first exit, we wouldn’t be safe anymore. (Moreover, our assumption would definitely be invalid in this case!) So we should replace Rover.Cannot_Crash
in our precondition with the stronger expression Rover_HAL.Get_Sonar_Distance >= Rover.Safety_Distance
1.
Now SPARK can fully prove Go_Forward
, given the assumption that we added.
Relaxing the Assumption #
While this works, I don’t like it. As we just saw, assumptions are a bit dangerous. Since SPARK cannot check them, it’s far too easy to assume something that just isn’t valid. (It’s also sometimes easy to assume something that’s trivially false — and this has disastrous implications for the remainder of the analysis, since in logic False → False is True.)
First, baldly assuming Distance >= Rover.Safety_Distance
is far too strong. What if we made a mistake in our definition of Sonar_Sampling_Delay
and set it to 400 ms? or 4000 ms? Would the Rover still be safe? We have some notion of the Rover’s dynamics that led us to believe that our code was correct; we could encode that, instead.
The Rover must have some maximum speed that it can achieve. Let’s define Max_Speed
to be 0.5 units per millisecond2. Then we can compute the maximum distance the Rover could travel in a given number of milliseconds and use this in our assumption, instead. So if we record the last distance measurement taken, we could say Distance >= Last_Distance - Sonar_Sampling_Delay * Max_Speed
.
Taking types into account, we can rewrite our assumption like this:
pragma Assume
(Distance >=
(declare
Displacement renames
Unsigned_32 (Max_Speed * Float (Sonar_Sampling_Delay));
begin
(if Last_Distance < Displacement then
0
else
Last_Distance - Displacement)));
To do this, we need to create Last_Distance
, which we’ll make Ghost
(since we need it only for proof). We also need to set an initial value for Distance
, since we’ll be reading it (to assign to Last_Distance
) prior to polling the sonar. We’ll start by initializing Distance
to Rover.Safety_Distance
, because that’s in our precondition.
When we try to repeat the proof, it fails: SPARK can’t prove the postcondition. If we assert Rover.Cannot_Crash
before the second exit statement, we see that this is the path along which SPARK is unable to prove the postcondition: the assertion cannot be proved, but SPARK does prove the postcondition under the assumption that the assertion holds.
pragma Assume
(Distance >=
(declare
Displacement renames
Unsigned_32 (Max_Speed * Float (Sonar_Sampling_Delay));
begin
(if Last_Distance < Displacement then
0
else
Last_Distance - Displacement)));
pragma Assert (Rover.Cannot_Crash); -- This is not proved
exit when Distance < Distance_Threshold;
Intuitively, the failure to prove the assertion tells us that Last_Distance
is the problem: SPARK is told that the distance to the obstacle is greater than a function of Last_Distance
, but SPARK’s clearly not seeing that this relationship is as strong as we need.
We can confirm our intuition by strengthening the assertion to Distance >= Rover.Safety_Distance
, which SPARK cannot prove. We can further confirm that the problem is Last_Distance
by adding an assertion about Last_Distance
in front of our assumption.
pragma Assert (Last_Distance >= Distance_Threshold); -- Not proved
pragma Assume
(Distance >=
(declare
Displacement renames
Unsigned_32 (Max_Speed * Float (Sonar_Sampling_Delay));
begin
(if Last_Distance < Displacement then
0
else
Last_Distance - Displacement)));
exit when Distance < Distance_Threshold;
So we need to know that Last_Distance >= Distance_Threshold
for our assumption to establish that Distance >= Rover.Safety_Distance
; if we assert this, SPARK cannot prove it. We need to make three changes to fix this.
First, because Last_Distance
is assigned the value of Distance
that was constrained in the prior iteration of the loop, we have to write the constraint on Distance
in a loop invariant. (Remember that, in SPARK, loops represent cut points in the analysis for any variables assigned during the loop; only type properties and properties established in loop invariants are carried from one loop iteration to the next.) So we add Distance >= Distance_Threshold
as a new loop invariant, next to Rover.Cannot_Crash
. This ensures that SPARK will know that Last_Distance >= Safety_Threshold
on subsequent loop iterations.
But what about the first iteration? This leads us to the second and third changes. We need to strengthen our precondition from Rover_HAL.Get_Sonar_Distance >= Rover.Safety_Distance
to Rover.Get_Sonar_Distance >= Distance_Threshold
and we need to update the initialization of Distance
to match. This will ensure that, on the first iteration, Last_Distance >= Safety_Threshold
.
procedure Go_Forward (This : in out Auto_State) with
Pre => Initialized and then
Rover_HAL.Get_Sonar_Distance >= Distance_Threshold,
Post => Initialized and then
Rover.Cannot_Crash
is
Distance : Unsigned_32 := Distance_Threshold;
With these changes, when we rerun SPARK, the postcondition is proved.
We could stop here, but I personally still don’t like having this assumption in our code. It’s not terribly obvious that we’ve assumed something that needs to be validated. And, if we wanted to leverage this model of our dynamics elsewhere, we’d have to repeat ourselves.
Fortunately, SPARK gives us tools to modularize these kinds of assumptions.
Modularizing the Assumption #
We modularize the assumption in the same way that we might introduce a lemma or describe the contract on a binding to another language: we declare a procedure whose postcondition is the assumption and mark it Ghost and Import.
procedure Rover_Displacement_Model
(Distance : Unsigned_32;
Last_Distance : Unsigned_32;
Ellapsed_Time : Unsigned_32)
with
Global => null,
Post => (Distance >=
(declare
Displacement renames
Unsigned_32 (Max_Speed * Float (Ellapsed_Time));
begin
(if Last_Distance < Displacement then
0
else
Last_Distance - Displacement))),
Ghost,
Import,
Always_Terminates;
Since SPARK’s analysis is modular, when we call this procedure, SPARK will assume the postcondition whenever the precondition (which is empty, here) is met. We can now replace the assumption in Go_Forward
with a “call” of this Ghost procedure3.
Here’s the final form of Go_Forward
that incorporates all of the changes and is fully proved:
Distance_Threshold : constant := 40;
Sonar_Sampling_Delay : constant := 40;
procedure Go_Forward (This : in out Auto_State) with
Pre => Initialized and then
Rover_HAL.Get_Sonar_Distance >= Distance_Threshold,
Post => Initialized and then
Rover.Cannot_Crash
is
Distance : Unsigned_32 := Distance_Threshold;
Last_Distance : Unsigned_32 with Ghost;
begin
-- Go forward...
Set_Turn (Straight);
Set_Power (Left, 100);
Set_Power (Right, 100);
-- Rotate the mast and check for obstacle
loop
Check_User_Input (This);
exit when This.User_Exit;
Mast_Control.Next_Mast_Angle (This.Mast, -60, 70, 16);
Last_Distance := Distance;
Distance := Sonar_Distance;
Rover_Displacement_Model
(Distance, Last_Distance, Sonar_Sampling_Delay);
-- We invoke the Rover displacement model so that SPARK knows the
-- limits on how far the rover can have traveled since the last
-- distance measurement.
exit when Distance < Distance_Threshold;
pragma Loop_Invariant (Distance >= Distance_Threshold);
pragma Loop_Invariant (Rover.Cannot_Crash);
Delay_Milliseconds (Sonar_Sampling_Delay);
end loop;
end Go_Forward;
Back to Run
#
Now that we’ve changed the contract for Go_Forward
, we need to look again at the loop in Run
— because it’s quite clear that it won’t prove in its current state:
while not State.User_Exit loop
Go_Forward (State);
Find_New_Direction (State);
pragma Loop_Invariant (Rover.Cannot_Crash);
end loop;
This loop doesn’t meet the precondition on Go_Forward
. In the first iteration, we know only Rover.Cannot_Crash
from the precondition on Run
— which is weaker than Rover_HAL.Get_Sonar_Distance >= Distance_Threshold
. And in subsequent iterations, we still know only Rover.Cannot_Crash
because that’s all the loop invariant establishes.
No matter what else we do, we need to strengthen the postcondition on Find_New_Direction
to Rover_HAL.Get_Sonar_Distance >= Distance_Threshold
. And since Find_New_Direction
currently has no precondition (other than Initialized
), we might as well put it first in the loop:
while not State.User_Exit loop
Find_New_Direction (State);
Go_Forward (State);
pragma Loop_Invariant (Rover.Cannot_Crash);
end loop;
This way, we could actually drop the precondition on Run
altogether, allowing the Rover to start arbitrarily close to an obstacle and still avoid crashing.
Let’s see what it will take to get Find_New_Direction
to promise Rover_HAL.Get_Sonar_Distance >= Distance_Threshold
.
Improving Find_New_Direction
#
Here’s the current implementation of Find_New_Direction
:
procedure Find_New_Direction (This : in out Auto_State)
with
Pre => Initialized,
Post => Initialized and then
Rover.Cannot_Crash
is
Left_Dist : Unsigned_32 := 0;
Right_Dist : Unsigned_32 := 0;
Timeout, Now : Time;
begin
Now := Clock;
Timeout := Now + Milliseconds (10000);
Set_Turn (Straight);
Set_Power (Left, 0);
Set_Power (Right, 0);
-- Turn the mast back and forth and log the detected distance for the left
-- and right side.
loop
Check_User_Input (This);
Now := Clock;
exit when This.User_Exit or else Now > Timeout;
Mast_Control.Next_Mast_Angle (This.Mast, -60, 70, 4);
if Mast_Control.Last_Angle (This.Mast) <= -40 then
Left_Dist := Sonar_Distance;
end if;
if Mast_Control.Last_Angle (This.Mast) >= 40 then
Right_Dist := Sonar_Distance;
end if;
Delay_Milliseconds (30);
end loop;
if Now > Timeout then
if Left_Dist < 50 and then Right_Dist < 50 then
-- Obstacles left and right, turn around to find a new direction
Turn_Around;
elsif Left_Dist > Right_Dist then
-- Turn left a little
Set_Turn (Around);
Set_Power (Left, -100);
Set_Power (Right, 100);
Delay_Milliseconds (800);
else
-- Turn right a little
Set_Turn (Around);
Set_Power (Left, 100);
Set_Power (Right, -100);
Delay_Milliseconds (800);
end if;
end if;
end Find_New_Direction;
The logic here is that we look to the left and the right and measure the distance to obstacles in both directions and then reorient the vehicle based on those measurements. If both left and right are too close (closer than 50 units), we turn around. Otherwise, we turn towards whichever direction has an obstacle further away.
This is intuitive — but doesn’t guarantee that we’ll end up pointed in a direction with an obstacle more than 40 units — our distance threshold and the precondition for Go_Forward
— away. For example, what if we don’t turn far enough to the left or the right? The delay was chosen based on Rover dynamics — but we have no guarantee (in software!) that we’ll end up pointed in a direction with an obstacle more than 40 units away. What if motor power drops during the turn and we don’t turn as far as expected? Worse, what if the Rover finds itself inside of a (small) box with no exits? The current code would turn around and start moving forward without confirming that the way out is clear!
Now, these are corner cases that we might argue are impossible. But safety is about identifying corner cases and either removing them or arguing through risk assessment that we’re prepared to accept the consequences if the corner case occurs (e.g., the Rover drives into a canyon on Mars and is blocked in by a rock fall).
Before we strengthen the postcondition to Rover_HAL.Get_Sonar_Distance >= Distance_Threshold
and update this subprogram to meet it, we might pause and ask: “how is the current postcondition proved?” given the problems we note above. The answer is, again, our safety property is (much) weaker than the new postcondition we want to prove. Here, it’s met entirely because Find_New_Direction
always leaves the Rover in a state in which it is not moving straight forward. And that’s enough to satisfy our safety property, so the current postcondition proves. But it’s not enough to satisfy the precondition of our now-proved-safe Go_Forward
.
Fortunately, fixing Find_New_Direction
isn’t difficult. Here’s the updated, fully proved implementation:
procedure Find_New_Direction (This : in out Auto_State)
with
Pre => Initialized,
Post => Initialized and then
(This.User_Exit or else
Rover_HAL.Get_Sonar_Distance >= Distance_Threshold)
is
Left_Dist : Unsigned_32 := 0;
Right_Dist : Unsigned_32 := 0;
Timeout, Now : Time;
function Distance_Straight_Ahead return Unsigned_32 with
Side_Effects,
Pre => Initialized,
Post => Initialized and then
-- We have to make this promise because otherwise SPARK
-- cannot know that the result actually came from the
-- sonar.
Rover_HAL.Get_Sonar_Distance = Distance_Straight_Ahead'Result
is
begin
Rover_HAL.Set_Mast_Angle (0);
-- Set the mast to the straight position - this can take a bit of
-- time, so:
Delay_Milliseconds (50);
return Distance : Unsigned_32 do
Distance := Sonar_Distance;
end return;
-- Extended return used because SPARK won't allow a simple return
-- statement here, because the function has side effects.
end Distance_Straight_Ahead;
Distance : Unsigned_32;
begin
Now := Clock;
Timeout := Now + Milliseconds (10000);
Set_Turn (Straight);
Set_Power (Left, 0);
Set_Power (Right, 0);
-- Measure the distance straight ahead
Distance := Distance_Straight_Ahead;
while Distance < Distance_Threshold and not This.User_Exit loop
-- Turn the mast back and forth and log the detected distance for the
-- left and right side.
loop
Check_User_Input (This);
Now := Clock;
exit when This.User_Exit or else Now > Timeout;
Mast_Control.Next_Mast_Angle (This.Mast, -60, 70, 4);
if Mast_Control.Last_Angle (This.Mast) <= -40 then
Left_Dist := Sonar_Distance;
end if;
if Mast_Control.Last_Angle (This.Mast) >= 40 then
Right_Dist := Sonar_Distance;
end if;
Delay_Milliseconds (30);
end loop;
if Now > Timeout then
if Left_Dist < 50 and then Right_Dist < 50 then
-- Obstacles left and right, turn around to find a new direction
Turn_Around;
elsif Left_Dist > Right_Dist then
-- Turn left a little
Set_Turn (Around);
Set_Power (Left, -100);
Set_Power (Right, 100);
Delay_Milliseconds (800);
else
-- Turn right a little
Set_Turn (Around);
Set_Power (Left, 100);
Set_Power (Right, -100);
Delay_Milliseconds (800);
end if;
Distance := Distance_Straight_Ahead;
end if;
end loop;
end Find_New_Direction;
We’ve simply wrapped the current functionality in a while loop that continues searching for a new direction until the distance is greater than or equal to the distance threshold of 40 units. We’ve introduced a new, local function that measures the distance straight ahead because we need this functionality twice — so encapsulation helps us keep the implementation DRY (Don’t Repeat Yourself).
Note that we didn’t need to write a single loop invariant or assertion. SPARK proves that our new postcondition holds with only the updated implementation.
Finalizing Run
There’s one more change we need to make to Run
. Since Find_New_Direction
promises This.User_Exit
or Rover_HAL.Get_Sonar_Distance >= Distance_Threshold
, we need to check for user exit prior to moving forward:
while not State.User_Exit loop
Find_New_Direction (State);
exit when State.User_Exit;
Go_Forward (State);
pragma Loop_Invariant (Rover.Cannot_Crash);
end loop;
And that’s it. We now fully prove the safety property throughout autonomous mode, including while the Rover is driving forward in Go_Forward
.
[1] In safety-critical systems, it is common for a safety property (or a safety invariant) that is proved to hold now to be insufficient (too weak) to guarantee that it will hold later, e.g., during the next iteration of a loop. What’s required in this case is a stronger property - often referred to as an inductive invariant - that is strong enough to guarantee that the safety property will hold in a future state. (And, indeed, we typically need to know that the inductive invariant will guarantee that it holds in a future state, so that we can keep running safely.) As we will see in the next section, the inductive invariant that we need here is that the distance to an obstacle is greater than or equal to 40 units - a much stronger property than the 20-unit safety threshold that’s part of our safety property!
[2] It’s actually been measured to be 0.012 units per millisecond - in the final code, we set Max_Speed
to 0.015 for a bit of additional margin: we don’t need to squeeze all the performance possible out of the system.
[3] There are at least two implicit assumptions underpinning the correctness of invoking our dynamic model, here: (1) that the sonar measurements overlap the axis of motion (i.e., detect obstacles straight forward) and (2) that the elapsed time since the last measurement is no greater than the sonar sampling delay. We could encode these as preconditions and prove they hold at the point of call - but to do so, we’d have to modify our HAL to report the direction and time of the sonar measurement so we can retrieve these via (models of) getters. For the moment, we’ll leave this enhancement as an exercise for the reader.