Going After the Low Hanging Bug

by Raphaël Amiard , Yannick Moy , Pierre-Marie de Rodat – Jan 30, 2017

At AdaCore, we've been developing deep static analysis tools (CodePeer and SPARK) since 2008. And if you factor in the fact that some developers of these tools had been developing these tools (or others) for the past decades, it's fair to say that we've got a deep expertise in deep static analysis tools.

At the same time, many Web companies have adopted light-weight static analysis integrated in their agile code-review-commit cycle. Some of these tools are deployed for checking all commits in the huge codebases of Google (Tricorder) or Facebook (Infer). Others are commercial tools implementing hundreds of small checkers (SonarLint, PVS-Studio, Flawfinder, PC-lint, CppCheck). The GNAT compiler implements some of these checkers through its warnings, our coding standard checker GNATcheck implements others, but we are also missing in our technology some useful checkers that rely on intraprocedural or interprocedural control and data flow analysis typically out of reach of a compiler or a coding standard checker.

Some of these checkers are in fact implemented with much greater precision in our deep static analysis tools CodePeer and SPARK, but running these tools requires developing a degree of expertise in the underlying technology to be used effectively, and their use can be costly in terms of resources (machines, people). Hence, these tools are typically used for high assurance software, where the additional confidence provided by deep static analysis outweights the costs. In addition, our tools target mostly absence of run-time errors and a few logical errors like unused variables or statements and a few suspicious constructs. Thus they don't cover the full spectrum of checkers implemented in light-weight static analyzers.

Luckily, the recent Libadalang technology, developed at AdaCore, provides an ideal basis on which to develop such light-weight static analysis, as it can parse and analyze thousands of lines of code in seconds. As an experiment, we implemented two simple checkers using the Python binding of Libadalang, and we found a dozen bugs in the codebases of the tools we develop at AdaCore (including the compiler and static analyzers). That's what we describe in the following.

Checker 1: When Computing is a Waste of Cycles

The first checker detects arguments of some arithmetic and comparison operators which are syntactically identical, in cases where this could be expressed with a constant instead (like "X - X" or "X <= X"):

import libadalang as lal

def same_tokens(left, right):
    return len(left) == len(right) and all(
        le.kind == ri.kind and le.text == ri.text
        for le, ri in zip(left, right)
    )

def has_same_operands(binop):
    return same_tokens(list(binop.f_left.tokens), list(binop.f_right.tokens))

def interesting_oper(op):
    return not isinstance(op, (lal.OpMult, lal.OpPlus, lal.OpDoubleDot,
                               lal.OpPow, lal.OpConcat))

c = lal.AnalysisContext('utf-8')
unit = c.get_from_file(source_file)
for b in unit.root.findall(lal.BinOp):
    if interesting_oper(b.f_op) and has_same_operands(b):
        print 'Same operands for {} in {}'.format(b, source_file)

(Full source available at https://github.com/AdaCore/lib...)

Despite all the extensive testing that is done on our products, this simple 20-lines checker found 1 bug in the GNAT compiler, 3 bugs in CodePeer static analyzer and 1 bug in the GPS IDE! We show them below so that you can convince yourself that they are true bugs, really worth finding.

The bug in GNAT is on the following code in sem_prag.adb:

--  Attribute 'Result matches attribute 'Result

            elsif Is_Attribute_Result (Dep_Item)
              and then Is_Attribute_Result (Dep_Item)
            then
               Matched := True;

One or the references to Dep_Item should really be Ref_Item. Here is the correct version:

--  Attribute 'Result matches attribute 'Result

            elsif Is_Attribute_Result (Dep_Item)
              and then Is_Attribute_Result (Ref_Item)
            then
               Matched := True;

Similarly, one of the three bugs in CodePeer can be found in be-ssa-value_numbering-stacks.adb:

--  Recurse on array base and sliding amounts;
            if VN_Kind (Addr_With_Index) = Sliding_Address_VN
              and then Num_Sliding_Amounts (Addr_With_Index) =
                         Num_Sliding_Amounts (Addr_With_Index)
            then

where one of the references to Addr_With_Index above should really be to Addr_With_Others_VN, and the other two are in be-value_numbers.adb:

return VN_Global_Obj_Id (VN2).Obj_Id_Number =
           VN_Global_Obj_Id (VN2).Obj_Id_Number
           and then VN_Global_Obj_Id (VN2).Enclosing_Module =
           VN_Global_Obj_Id (VN2).Enclosing_Module;

where two of the four references to VN2 should really be to VN1.

The bug in GPS is in language-tree-database.adb:

if Get_Construct (Old_Obj).Attributes /=
                 Get_Construct (New_Obj).Attributes
                 or else Get_Construct (Old_Obj).Is_Declaration /=
                 Get_Construct (New_Obj).Is_Declaration
                 or else Get_Construct (Old_Obj).Visibility /=
                 Get_Construct (Old_Obj).Visibility

The last reference to Old_Obj should really be New_Obj.

Checker 2: When Testing Gives No Information

The second checker detects syntactically identical expressions which are chained together in a chain of logical operators, so that one of the two identical tests is useless (as in "A or B or A"):

import libadalang as lal

def list_operands(binop):
    def list_sub_operands(expr, op):
        if isinstance(expr, lal.BinOp) and type(expr.f_op) is type(op):
            return (list_sub_operands(expr.f_left, op)
                    + list_sub_operands(expr.f_right, op))
        else:
            return [expr]

    op = binop.f_op
    return (list_sub_operands(binop.f_left, op)
            + list_sub_operands(binop.f_right, op))

def is_bool_literal(expr):
    return (isinstance(expr, lal.Identifier)
            and expr.text.lower() in ['true', 'false'])

def has_same_operands(expr):
    ops = set()
    for op in list_operands(expr):
        tokens = tuple((t.kind, t.text) for t in op.tokens)
        if tokens in ops:
            return op
        ops.add(tokens)

def same_as_parent(binop):
    par = binop.parent
    return (isinstance(binop, lal.BinOp)
            and isinstance(par, lal.BinOp)
            and type(binop.f_op) is type(par.f_op))

def interesting_oper(op):
    return isinstance(op, (lal.OpAnd, lal.OpOr, lal.OpAndThen, lal.OpOrElse,
                           lal.OpXor))

c = lal.AnalysisContext('utf-8')
unit = c.get_from_file(source_file)
for b in unit.root.findall(lambda e: isinstance(e, lal.BinOp)):
    if interesting_oper(b.f_op) and not same_as_parent(b):
        oper = has_same_operands(b)
        if oper:
            print 'Same operand {} for {} in {}'.format(oper, b, source_file)

(Full source available at https://github.com/AdaCore/lib...)

Again, this simple 40-lines checker found 4 code quality issues in the GNAT compiler, 2 bugs in CodePeer static analyzer, 1 bug and 1 code quality issue in GPS IDE and 1 bug in QGen code generator. Ouch!

The four code quality issues in GNAT are simply duplicated checks that are not useful. For example in par-endh.adb:

--  Cases of normal tokens following an END

          (Token = Tok_Case   or else
           Token = Tok_For    or else
           Token = Tok_If     or else
           Token = Tok_Loop   or else
           Token = Tok_Record or else
           Token = Tok_Select or else

         --  Cases of bogus keywords ending loops

           Token = Tok_For    or else
           Token = Tok_While  or else

The test "Token = Tok_For" is present twice. Probably better for maintenance to have it once only. The three other issues are similar.

The two bugs in CodePeer are in utils-arithmetic-set_arithmetic.adb:

Result  : constant Boolean
        := (not Is_Singleton_Set (Set1)) and then (not Is_Singleton_Set (Set1))
        and then (Num_Range_Pairs (Set1, Set2) > Range_Pair_Limit);

The second occurrence of Set1 should really be Set2.

The code quality issue in GPS is in mi-parser.adb:

Token = "traceframe-changed" or else
           Token = "traceframe-changed" or else

The last line is useless. The bug in GPS is in vcs.adb:

return (S1.Label = null and then S2.Label = null
              and then S2.Icon_Name = null and then S2.Icon_Name = null)

The first reference to S2 on the last line should really be S1. Note that this issue had already been detected by CodePeer, which is run as part of GPS quality assurance, and it had been fixed on the trunk by one of the GPS developers. Interestingly here, two tools using either a syntactic heuristic or a deep semantic interpretation (allowing CodePeer to detect that "S2.Icon_Name = null" is always true when reaching the last subexpression) reach the same conclusion on that code.

Finally, the bug in QGen is in himoco-change_buffers.ads:

procedure Apply_Update (Self : in out Change_Buffer)
   with Post =>
   --  @req TR-CHB-Apply_Update
   --  All signals, blocks and variables to move shall
   --  be moved into their new container. All signals to merge shall be
   --  merged.
     (
        (for all Elem of Self.Signals_To_Move =>
             (Elem.S.Container.Equals (Elem.Move_Into.all)))
      and then
        (for all Elem of Self.Blocks_To_Move =>
             (if Elem.B.Container.Is_Not_Null then
                  Elem.B.Container.Equals (Elem.Move_Into.all)))
      and then
        (for all Elem of Self.Signals_To_Move =>
             (Elem.S.Container.Equals (Elem.Move_Into.all)))
      and then

The first and third conjuncts in the postcondition are the same. After checking with QGen developers, the final check here was actually meant for Self.Variables_To_Move instead of Self.Signals_To_Move. So we detected here a bug in the specification (expressed as a contract), using a simple syntactic checker!

Setup Recipe

So you actually want to try the above scripts on your own codebase? This is possible right now with your latest GNAT Pro release or the latest GPL release for community & academic users! Just follow the instructions we described in the Libadalang repository, you will then be able to run the scripts inside your favorite Python2 interpreter.

Conclusion

Overall, this little experiment was eye opening, in particular for us who develop these tools, as we did not expect such gross mistakes to have gone through our rigorous reviews and testing. We will continue investigating what benefits light-weight static analysis might provide, and if this investigation is successful, we will certainly include this capability in our tools. Stay tuned!

[cover image by Max Pixel, Creative Commons Zero - CC0]

Posted in #Static Analysis

About Raphaël Amiard

Raphaël Amiard is a software engineer at AdaCore working on tooling and compiler technologies. He joined AdaCore in 2013, after an internship on AdaCore's IDEs. His main interests are compiler technologies, language design, and sound/music making.

About Yannick Moy

Yannick Moy is Head of the Static Analysis Unit at AdaCore. Yannick contributes to the development of SPARK, a software source code analyzer aiming at verifying safety/security properties of programs. He frequently talks about SPARK in articles, conferences, classes and blogs (in particular blog.adacore.com). Yannick previously worked on source code analyzers for PolySpace (now The MathWorks) and at Université Paris-Sud.

About Pierre-Marie de Rodat

Pierre-Marie joined AdaCore in 2013, after he got an engineering degree at EPITA (IT engineering school in Paris). He mainly works on GNATcoverage, GCC, GDB and Libadalang.