Going After the Low Hanging Bug
by Raphaël Amiard , Yannick Moy , Pierre-Marie de Rodat –
At AdaCore, we've been developing deep static analysis tools (CodePeer and SPARK) since 2008. And if you factor in the fact that some developers of these tools had been developing these tools (or others) for the past decades, it's fair to say that we've got a deep expertise in deep static analysis tools.
At the same time, many Web companies have adopted light-weight static analysis integrated in their agile code-review-commit cycle. Some of these tools are deployed for checking all commits in the huge codebases of Google (Tricorder) or Facebook (Infer). Others are commercial tools implementing hundreds of small checkers (SonarLint, PVS-Studio, Flawfinder, PC-lint, CppCheck). The GNAT compiler implements some of these checkers through its warnings, our coding standard checker GNATcheck implements others, but we are also missing in our technology some useful checkers that rely on intraprocedural or interprocedural control and data flow analysis typically out of reach of a compiler or a coding standard checker.
Some of these checkers are in fact implemented with much greater precision in our deep static analysis tools CodePeer and SPARK, but running these tools requires developing a degree of expertise in the underlying technology to be used effectively, and their use can be costly in terms of resources (machines, people). Hence, these tools are typically used for high assurance software, where the additional confidence provided by deep static analysis outweights the costs. In addition, our tools target mostly absence of run-time errors and a few logical errors like unused variables or statements and a few suspicious constructs. Thus they don't cover the full spectrum of checkers implemented in light-weight static analyzers.
Luckily, the recent Libadalang technology, developed at AdaCore, provides an ideal basis on which to develop such light-weight static analysis, as it can parse and analyze thousands of lines of code in seconds. As an experiment, we implemented two simple checkers using the Python binding of Libadalang, and we found a dozen bugs in the codebases of the tools we develop at AdaCore (including the compiler and static analyzers). That's what we describe in the following.
Checker 1: When Computing is a Waste of Cycles
The first checker detects arguments of some arithmetic and comparison operators which are syntactically identical, in cases where this could be expressed with a constant instead (like "X - X" or "X <= X"):
import libadalang as lal
def same_tokens(left, right):
return len(left) == len(right) and all(
le.kind == ri.kind and le.text == ri.text
for le, ri in zip(left, right)
)
def has_same_operands(binop):
return same_tokens(list(binop.f_left.tokens), list(binop.f_right.tokens))
def interesting_oper(op):
return not isinstance(op, (lal.OpMult, lal.OpPlus, lal.OpDoubleDot,
lal.OpPow, lal.OpConcat))
c = lal.AnalysisContext('utf-8')
unit = c.get_from_file(source_file)
for b in unit.root.findall(lal.BinOp):
if interesting_oper(b.f_op) and has_same_operands(b):
print 'Same operands for {} in {}'.format(b, source_file)
(Full source available at https://github.com/AdaCore/lib...)
Despite all the extensive testing that is done on our products, this simple 20-lines checker found 1 bug in the GNAT compiler, 3 bugs in CodePeer static analyzer and 1 bug in the GPS IDE! We show them below so that you can convince yourself that they are true bugs, really worth finding.
The bug in GNAT is on the following code in sem_prag.adb:
-- Attribute 'Result matches attribute 'Result
elsif Is_Attribute_Result (Dep_Item)
and then Is_Attribute_Result (Dep_Item)
then
Matched := True;
One or the references to Dep_Item should really be Ref_Item. Here is the correct version:
-- Attribute 'Result matches attribute 'Result
elsif Is_Attribute_Result (Dep_Item)
and then Is_Attribute_Result (Ref_Item)
then
Matched := True;
Similarly, one of the three bugs in CodePeer can be found in be-ssa-value_numbering-stacks.adb:
-- Recurse on array base and sliding amounts;
if VN_Kind (Addr_With_Index) = Sliding_Address_VN
and then Num_Sliding_Amounts (Addr_With_Index) =
Num_Sliding_Amounts (Addr_With_Index)
then
where one of the references to Addr_With_Index above should really be to Addr_With_Others_VN, and the other two are in be-value_numbers.adb:
return VN_Global_Obj_Id (VN2).Obj_Id_Number =
VN_Global_Obj_Id (VN2).Obj_Id_Number
and then VN_Global_Obj_Id (VN2).Enclosing_Module =
VN_Global_Obj_Id (VN2).Enclosing_Module;
where two of the four references to VN2 should really be to VN1.
The bug in GPS is in language-tree-database.adb:
if Get_Construct (Old_Obj).Attributes /=
Get_Construct (New_Obj).Attributes
or else Get_Construct (Old_Obj).Is_Declaration /=
Get_Construct (New_Obj).Is_Declaration
or else Get_Construct (Old_Obj).Visibility /=
Get_Construct (Old_Obj).Visibility
The last reference to Old_Obj should really be New_Obj.
Checker 2: When Testing Gives No Information
The second checker detects syntactically identical expressions which are chained together in a chain of logical operators, so that one of the two identical tests is useless (as in "A or B or A"):
import libadalang as lal
def list_operands(binop):
def list_sub_operands(expr, op):
if isinstance(expr, lal.BinOp) and type(expr.f_op) is type(op):
return (list_sub_operands(expr.f_left, op)
+ list_sub_operands(expr.f_right, op))
else:
return [expr]
op = binop.f_op
return (list_sub_operands(binop.f_left, op)
+ list_sub_operands(binop.f_right, op))
def is_bool_literal(expr):
return (isinstance(expr, lal.Identifier)
and expr.text.lower() in ['true', 'false'])
def has_same_operands(expr):
ops = set()
for op in list_operands(expr):
tokens = tuple((t.kind, t.text) for t in op.tokens)
if tokens in ops:
return op
ops.add(tokens)
def same_as_parent(binop):
par = binop.parent
return (isinstance(binop, lal.BinOp)
and isinstance(par, lal.BinOp)
and type(binop.f_op) is type(par.f_op))
def interesting_oper(op):
return isinstance(op, (lal.OpAnd, lal.OpOr, lal.OpAndThen, lal.OpOrElse,
lal.OpXor))
c = lal.AnalysisContext('utf-8')
unit = c.get_from_file(source_file)
for b in unit.root.findall(lambda e: isinstance(e, lal.BinOp)):
if interesting_oper(b.f_op) and not same_as_parent(b):
oper = has_same_operands(b)
if oper:
print 'Same operand {} for {} in {}'.format(oper, b, source_file)
(Full source available at https://github.com/AdaCore/lib...)
Again, this simple 40-lines checker found 4 code quality issues in the GNAT compiler, 2 bugs in CodePeer static analyzer, 1 bug and 1 code quality issue in GPS IDE and 1 bug in QGen code generator. Ouch!
The four code quality issues in GNAT are simply duplicated checks that are not useful. For example in par-endh.adb:
-- Cases of normal tokens following an END
(Token = Tok_Case or else
Token = Tok_For or else
Token = Tok_If or else
Token = Tok_Loop or else
Token = Tok_Record or else
Token = Tok_Select or else
-- Cases of bogus keywords ending loops
Token = Tok_For or else
Token = Tok_While or else
The test "Token = Tok_For" is present twice. Probably better for maintenance to have it once only. The three other issues are similar.
The two bugs in CodePeer are in utils-arithmetic-set_arithmetic.adb:
Result : constant Boolean
:= (not Is_Singleton_Set (Set1)) and then (not Is_Singleton_Set (Set1))
and then (Num_Range_Pairs (Set1, Set2) > Range_Pair_Limit);
The second occurrence of Set1 should really be Set2.
The code quality issue in GPS is in mi-parser.adb:
Token = "traceframe-changed" or else
Token = "traceframe-changed" or else
The last line is useless. The bug in GPS is in vcs.adb:
return (S1.Label = null and then S2.Label = null
and then S2.Icon_Name = null and then S2.Icon_Name = null)
The first reference to S2 on the last line should really be S1. Note that this issue had already been detected by CodePeer, which is run as part of GPS quality assurance, and it had been fixed on the trunk by one of the GPS developers. Interestingly here, two tools using either a syntactic heuristic or a deep semantic interpretation (allowing CodePeer to detect that "S2.Icon_Name = null" is always true when reaching the last subexpression) reach the same conclusion on that code.
Finally, the bug in QGen is in himoco-change_buffers.ads:
procedure Apply_Update (Self : in out Change_Buffer)
with Post =>
-- @req TR-CHB-Apply_Update
-- All signals, blocks and variables to move shall
-- be moved into their new container. All signals to merge shall be
-- merged.
(
(for all Elem of Self.Signals_To_Move =>
(Elem.S.Container.Equals (Elem.Move_Into.all)))
and then
(for all Elem of Self.Blocks_To_Move =>
(if Elem.B.Container.Is_Not_Null then
Elem.B.Container.Equals (Elem.Move_Into.all)))
and then
(for all Elem of Self.Signals_To_Move =>
(Elem.S.Container.Equals (Elem.Move_Into.all)))
and then
The first and third conjuncts in the postcondition are the same. After checking with QGen developers, the final check here was actually meant for Self.Variables_To_Move instead of Self.Signals_To_Move. So we detected here a bug in the specification (expressed as a contract), using a simple syntactic checker!
Setup Recipe
So you actually want to try the above scripts on your own codebase? This is possible right now with your latest GNAT Pro release or the latest GPL release for community & academic users! Just follow the instructions we described in the Libadalang repository, you will then be able to run the scripts inside your favorite Python2 interpreter.
Conclusion
Overall, this little experiment was eye opening, in particular for us who develop these tools, as we did not expect such gross mistakes to have gone through our rigorous reviews and testing. We will continue investigating what benefits light-weight static analysis might provide, and if this investigation is successful, we will certainly include this capability in our tools. Stay tuned!
[cover image by Max Pixel, Creative Commons Zero - CC0]