The problem: reducing an Ada code base
There are times when we have to fix problems - say, compiler issues, or IDEs missing references - on awkwardly large Ada code bases.
When this happens, the first task is generally to reduce this reproducer to a smaller, more manageable, size.
This means hacking away at the code base, removing bit by bit manually, heeding the Ada rules sufficiently to keep reproducing the original issue without raising new ones: a thankless, tedious task. It's frustrating to do this ourselves, and we loathe to ask our customers to do it.
Engineers gotta engineer, and we've been wanting to automate this.
Unfortunately it is not simple to automate - we can't just remove bits of code at random in Ada:
we can't remove a subprogram while it's being referenced
we can't remove a subprogram implementation without removing the specification
we can't remove "with" clause for a package while it is still being referenced
It was too hard to do and not worth the effort, until…
Libadalang to the rescue!
Libadalang provides everything we need to code an automatic reducer: a lightweight API to process Ada code as an abstract syntax tree, complete with cross-reference features to precisely find all the references to an entity.
Also, libadalang is able to ingest a million-line code base and process subsequent incremental changes on top of this very quickly - which is absolutely necessary, as we'll want to make thousands of edits per minute.
The interface is given to us by the popular C-Reduce tool, which does this for one C/C++/OpenCL file:
- Define an "oracle script" which expresses the condition to reproduce. For instance:
the program must compile successfully and print "hello", or
the compilation must yield this error message and no other error message
The program makes edits and runs the oracle at each step to verify that the condition we want to test is preserved.
The program adareducer is born.
For convenience we have packaged it with GNAT Studio (starting with the Pro release 23.1, and the upcoming GitHub continuous release). We've added some useful menus too, for instance to collect your project in a separate sandbox area before the adareducer starts hacking at it. The documentation can be found here. And the code is on GitHub here.
Under the hood: libadalang and a bunch of parlor tricks
With libadalang at hand, the program is pretty simple to write. The tasks to do are:
first try crudely removing some files
then try emptying subprogram bodies
then try emptying the declarative parts in subprograms
then try removing some global variables, then packages "with”s
Do this all in a dichotomy: first try removing all the subprograms/statements within one given scope; if the oracle fails after this, try instead removing the top half, then the second half, etc. As needed, descend into nested subprograms/statements.
There are a bunch of small tricks on top of this:
we can't remove all statements in a body, we need to leave a "null;" statement
functions require at least one return statement: insert a call to itself to make sure we return the right type
when we remove statements, replace them with the equivalent numbers of blank lines, so we can make further edits based on the same analysis without having to maintain a table of line drifts
Does it work?
Yes! We've been using it at AdaCore on real-life cases. It can reduce a program from a hundred thousand lines to a few hundred lines in a couple of hours. Arguably, a human with good intuition might be faster, but the point is that said human is now free to work on more interesting problems.
(It can still be improved in many ways. In particular, it does not try to be clever. For instance, when it has found a way to reduce the program, this becomes the new program to reduce; it does not backtrack and try to find a better way to reduce the program.)
Can we see it?
It's always a pleasure to see a computer work for you.
adareducer at work