In my current job and in my previous job, I was involved in customer support where a user of our product would basically say “your tool does not handle my code correctly.” Whether the problem really is in our tool, or actually with their code, we need to come up with a simplified version of the “incorrect” code. This is not always a trivial task. The customer may be dealing with a fairly large chunk of code, and they may not understand their codebase or our tool well enough to narrow down the problem. It is usually faster for the customer to send as much code as possible, and let the support engineer try to shrink it to a manageable size, as they likely have a better understanding of the tool and what is happening to the example code.
So now we are onto the next problem – how does the support engineer get a copy of the problematic code? In most instances, the customer is very protective of their code – proprietary information, algorithms, or data structures; sometimes even object names can convey information! For example, suppose you had a constant:
Constant_Max_Velocity : constant Velocity_T := 12.3;
That name and value tells you something important about the widget being built.
One could always use some search-and-replace mechanisms to mask some names, but that gets into some complicated matching algorithms, and consistency issues. In just the simple example above, how would you replace “constant” in one place and not the other, and do you replace “Velocity” with the same token or different tokens. What we really want is a tool that understands the language, and can intelligently replace tokens based on their context, not just their textual value.
So I took on the task of writing a tool to “obfuscate” a code base. One of the best ways to write a tool like this for Ada is using Libadalang. This library (available in Ada or Python) allows the user to parse Ada files and gather semantic information. I could then use the semantic information to track every name definition and replace the name wherever it was referenced. With Libadalang, I did not need to worry about the above problems with “constant” and “velocity”.
I chose to write this tool in Ada, because Ada gave me access to the GNATcoll bindings. With these bindings, I could parse GNAT Project Files to find all the source files I needed. This allowed me to obfuscate an entire codebase – once modified, the new codebase should compile correctly and even perform correctly, although none of the variable names would make any sense!
As a simple example, I wrote a small program that solve the “N-Queens” problem (wikipedia) There is not much “proprietary” information in the code, but looking at the original code and obfuscated code side-by-side shows the information given just by changing names.
Even in this simple example, we see the information “loss”. On the left, we know we are working with a Board made of Rows and Columns; on the right, all we know is we are looping through some counter.
When I started writing this application, I chose to use SPARK subset of the Ada language rather than the full language. With this, I could use the SPARK provers to help ensure the robustness and correctness of the code. My plan was to get all of the code to SPARK “Silver” level (proving the absence of run-time errors), with some of the code reaching “Gold” level (proving functions actually do what they are supposed to).
The first iteration of coding was just writing the application to be SPARK-compliant (“Stone” level). Even this low level required some re-thinking of coding practices. For example: I have a subprogram that basically returns the next available obfuscated name. In Ada, I would write this as a function that increments a counter and returns the next item. In SPARK, functions cannot have side effects (modifying global data) so this function had to be re-written as a procedure. Not a difficult task, but a language constraint (that makes your code a little safer!)
Next, I moved onto flow analysis (“Bronze” level). This involved adding Global dependency contracts (showing which global data was read/modified by which subprogram) and turning off SPARK for some subprograms that had to deal with non-SPARK compliant run-time code. With SPARK, an interface can be in SPARK mode while its implementation is not in SPARK mode. I used this to wrap some non-SPARK code (like some run-time packages) and make my SPARK analysis happier.
In my mind, the most important step was making sure I didn’t have any of the typical overflow issues – an absence of run-time errors (“Silver”) level. This is not as easy as it seems, especially when dealing with strings. The simple act of concatenating two strings raises a lot of flags in proving there are no run-time errors.
procedure Do_Something (Y : String; Z : String) with
X : constant String :=
Y (Y'first + 1 .. Y'last) & Z (Z'first .. Z'last - 1);
The most obvious concern is what happens when concatenating the two strings creates a string that is too long for X to hold. You would need to add preconditions on the lengths of Y and Z. But, because the index range of a string is integer, then Y’first could be integer’last – and adding one to that can generate a constraint error (similarly for Z’last and integer’first). As a human, you “know” these indices are not going to happen, but for SPARK to accept it, you have to prove the indices are not going to happen – either through preconditions, better type definitions, or redefining the concatenation operator.
Finally, I started implementing some contracts on subprogram behavior to prove that the subprogram did what I thought it should do (“Gold”) level. For some subprograms that is easy, some of them take a little more thought, and for some, the effort involved in proving correctness (SPARK) is more than the effort involved in showing correctness (testing). This process is still ongoing, and others who work with this application are welcome to contribute!
This application is available on GitHub for use or investigation. This code obfuscator is still a work in progress and will benefit from future Libadalang evolutions in order to support a more complete set of Ada features. But it is a good start to help those of us in the customer support world help our customers!