(Many) More Low Hanging Bugs

by Yannick Moy – May 05, 2017

In a previous post, we reported our initial experiments to create lightweight checkers for Ada source code, based on the new Libadalang technology. The two checkers we described discovered 12 issues in the codebase of the tools we develop at AdaCore. In this post, we are reporting on 6 more lightweight checkers, which have discovered 114 new issues in our codebase. These 6 checkers allowed us to detect errors and code quality issues for 4 of them (check_deref_null, check_test_not_null, check_same_test, check_bad_unequal) and refactoring opportunities for 2 of them (check_same_then_else, check_useless_assign). Every checker runs in seconds on our codebase, which made it easy to improve them until the checkers had no false alarms. Currently, none of these checkers uses the recent semantic analysis capability in Libadalang, which might be useful in the future to improve their precision. In each of these checkers, we took inspiration from similar lightweight checkers in other static analysis tools, in particular PVS-Studio and its gallery of real-life examples.

Checkers on Dereference

Our first checker is a favorite of many tools. It checks whether a pointer that has been derefenced, is later tested against the null value. This is suspicious, as we'd expect the sequence of events to be opposite. This can point either to an error (the pointer should not be dereferenced without a null check) or a code quality issue (the null test is useless). In fact, we found both when applying the checker on the codebase of our tools. Here's an example of error found in the GNAT compiler, in g-spipat.adb, where procedure Dump dereferences P.P at line 2088:

procedure Dump (P : Pattern) is

      subtype Count is Ada.Text_IO.Count;
      Scol : Count;
      --  Used to keep track of column in dump output

      Refs : Ref_Array (1 .. P.P.Index);
      --  We build a reference array whose N'th element points to the
      --  pattern element whose Index value is N.

and then much later at line 2156 it checks for P.P being null:

--  If uninitialized pattern, dump line and we are done

      if P.P = null then
         Put_Line ("Uninitialized pattern value");
         return;
      end if;

The code was fixed to declare array Refs after we know P.P is not null. And here is an example of code quality issue also in the GNAT compiler, at line 2797 of g-comlin.adb, where parameter Line is dereferenced and then tested against null:

Sections_List : Argument_List_Access :=
                        new Argument_List'(1 .. 1 => null);
      Found         : Boolean;
      Old_Line      : constant Argument_List := Line.all;
      Old_Sections  : constant Argument_List := Sections.all;
      Old_Params    : constant Argument_List := Params.all;
      Index         : Natural;

   begin
      if Line = null then
         return;
      end if;

The code was fixed by declaring Line as a parameter of a not null access type. In some cases the dereference and the test are both in the same expression, for example in this case in our tool GNATstack, at line 97 of dispatching_calls.adb:

if
           Static_Class.Derived.Contains (Explored_Method.Vtable_Entry.Class)
              and then
           Explored_Method /= null
              and then

The code was fixed here by checking for non-null before dereferencing Explored_Method. Overall, this checker found 11 errors in our codebase and 9 code quality issues.

A second checker in this category looks for places where a test that a pointer is null dominates a dereference of the same pointer. This is, in general, an indication of a logic error, in particular for complex boolean expressions, as shown by the examples from PVS-Studio gallery. We found no such error in our codebase, which may be an indication of our good test coverage. Indeed, any execution of such code will raise an exception in Ada.

Checkers on Tested Expressions

Our first checker on tested expressions looks for identical subexpressions being tested in a chain of if-elsif statements. It points to either errors or code quality issues. Here is an example of error it found in the GNAT compiler, at line 7380 of sem_ch4.adb:

if Nkind (Right_Opnd (N)) = N_Integer_Literal then
                     Remove_Address_Interpretations (Second_Op);

                  elsif Nkind (Right_Opnd (N)) = N_Integer_Literal then
                     Remove_Address_Interpretations (First_Op);
                  end if;

The code was fixed by testing Left_Opnd(N) instead of Right_Opnd(N) in the second test. Overall, this checker found 3 errors in our codebase and 7 code quality issues.

A second checker in this category looks for expressions of the form "A /= B or A /= C" where B and C are different literals, which are always True. In general "and" is meant instead of "or". This checker found one error in our QGen code generator, at line 675 of himoco-blockdiagramcmg.adb:

if Code_Gen_Mode /= "Function"
                 or else Code_Gen_Mode /= "Reusable function"
               then
                  To_Flatten.Append (Obj);
               end if;

Checkers for Code Duplication

Our first checker for code duplication looks for identical code in different branches of an if-statement or case-statement. It may point to typos or logical errors, but in our codebase it pointed only to refactoring opportunities. Still, some of these cause code duplication of more than 20 lines of code, for example at line 1023 of be-checks.adb in CodePeer:

elsif VN_Kind (VN) = Binexpr_VN
              and then Operator (VN) = Logical_And_Op
              and then Int_Sets.Is_In (Big_True, To_Int_Set_Part (Expect))
            then
               --  Recurse to propagate check down to operands of "and"
               Do_Check_Sequence
                 (Check_Kind,
                  Split_Logical_Node (First_Operand (VN)),
                  Srcpos,
                  File_Name,
                  First_Operand (VN),
                  Expect,
                  Check_Level,
                  Callee,
                  Callee_VN,
                  Callee_Expect,
                  Callee_Precondition_Index);
               Do_Check_Sequence
                 (Check_Kind,
                  Split_Logical_Node (Second_Operand (VN)),
                  Srcpos,
                  File_Name,
                  Second_Operand (VN),
                  Expect,
                  Check_Level,
                  Callee,
                  Callee_VN,
                  Callee_Expect,
                  Callee_Precondition_Index);
...
            elsif VN_Kind (VN) = Binexpr_VN
              and then Operator (VN) = Logical_Or_Op
              and then Int_Sets.Is_In (Big_False, To_Int_Set_Part (Expect))
            then
               --  Recurse to propagate check down to operands of "and"
               Do_Check_Sequence
                 (Check_Kind,
                  Split_Logical_Node (First_Operand (VN)),
                  Srcpos,
                  File_Name,
                  First_Operand (VN),
                  Expect,
                  Check_Level,
                  Callee,
                  Callee_VN,
                  Callee_Expect,
                  Callee_Precondition_Index);
               Do_Check_Sequence
                 (Check_Kind,
                  Split_Logical_Node (Second_Operand (VN)),
                  Srcpos,
                  File_Name,
                  Second_Operand (VN),
                  Expect,
                  Check_Level,
                  Callee,
                  Callee_VN,
                  Callee_Expect,
                  Callee_Precondition_Index);

or at line 545 of soap-generator-skel.adb in GPRbuild:

when WSDL.Types.K_Derived =>

                     if Output.Next = null then
                        Text_IO.Put
                          (Skel_Adb,
                           WSDL.Parameters.To_SOAP
                             (N.all,
                              Object    => "Result",
                              Name      => To_String (N.Name),
                              Type_Name => T_Name));
                     else
                        Text_IO.Put
                          (Skel_Adb,
                           WSDL.Parameters.To_SOAP
                             (N.all,
                              Object    =>
                                "Result."
                                  & Format_Name (O, To_String (N.Name)),
                              Name      => To_String (N.Name),
                              Type_Name => T_Name));
                     end if;

                  when WSDL.Types.K_Enumeration =>

                     if Output.Next = null then
                        Text_IO.Put
                          (Skel_Adb,
                           WSDL.Parameters.To_SOAP
                             (N.all,
                              Object    => "Result",
                              Name      => To_String (N.Name),
                              Type_Name => T_Name));
                     else
                        Text_IO.Put
                          (Skel_Adb,
                           WSDL.Parameters.To_SOAP
                             (N.all,
                              Object    =>
                                "Result."
                                  & Format_Name (O, To_String (N.Name)),
                              Name      => To_String (N.Name),
                              Type_Name => T_Name));
                     end if;

Overall, this checker found 62 code quality issues in our codebase.

Our last checker looks for useless assignment to a local variable, where the value is never read subsequently. This can be very obvious, such as this case at line 1067 of be-value_numbers-factory.adb in CodePeer:

Global_Obj.Obj_Id_Number    := Obj_Id_Number (New_Obj_Id);
      Global_Obj.Obj_Id_Number    := Obj_Id_Number (New_Obj_Id);

or more hidden, such as this case at line 895 of bt-xml-reader.adb, still in CodePeer:

if Next_Info.Sloc.Column = Msg_Loc.Column then
                                 Info := Next_Info;
                                 Elem := Next_Cursor;
                              end if;
                              Elem := Next_Cursor;

Overall, this checker found 9 code quality issues in our codebase.

Setup Recipe

So you actually want to try the above scripts on your own codebase? This is possible right now with your latest GNAT Pro release or the latest GPL release for community & academic users! Just follow the instructions we described in the Libadalang repository and you will be able to run the scripts inside your favorite Python2 interpreter.

Conclusion

Summing over the 8 checkers that we implemented so far (referenced in this post and a previous one), we've found and fixed 24 errors and 102 code quality issues in our codebase. This is definitely showing that these kind of checkers are worth integrating in static analysis tools and we look forward to integrating these and more in our static analyzer CodePeer for Ada programs.

Another lesson is that all of these checkers were developed in a couple of hours each, thanks to the powerful Python API available with Libadalang. While we had to develop some boilerplate to traverse the AST in various directions and multiple workarounds to make for the absence of semantic analysis in the (then available) version of Libadalang, this was relatively little work, and work that we can expect to share across similar checkers in the future. We're now looking forward to the version of Libadalang with on-demand semantic analysis, which will allow us to create even more powerful and useful checkers.

[cover image by Courtney Lemon]

Posted in #Static Analysis #Libadalang

About Yannick Moy

Yannick Moy is Head of the Static Analysis Unit at AdaCore. Yannick contributes to the development of SPARK, a software source code analyzer aiming at verifying safety/security properties of programs. He frequently talks about SPARK in articles, conferences, classes and blogs (in particular blog.adacore.com). Yannick previously worked on source code analyzers for PolySpace (now The MathWorks) and at Université Paris-Sud.