Ada Crate of the Year: Interactive code search
by Paul Jarrett –
Overview #
I learned Ada in 2021 and wrote a tool I use everyday with it, the bulk of which was written within my first two months of the language.
Septum finds neighborhoods of contiguous lines which match certain conditions and which don’t match other conditions. This is done interactively, you load your data once, and it stays in memory while you apply and remove rules, providing a quick way to whittle down large numbers of search results.
Since it looks for contiguous groups of lines, Septum also finds terms in any order across a context. A side benefit of interactivity is speed, by keeping contents in memory and distributing the search across all cores, Septum provides quick single second searches of millions of lines of code and tens of thousands of files. It does this without building indexes, using custom SIMD or any network connections or services. It’s not that these things wouldn’t improve Septum, but rather the straightforward approach has been fast enough.
Background #
I work with a team of senior core tech engineers in the video game industry that works on large codebases. We’re expected to solve code problems at all levels, which requires searching through and understanding large amounts of code. Between client and third party dependencies, this usually involves several million lines of code.
There are a lot of great search tools such as silver searcher and ripgrep. However, I’m not always certain of what I’m looking for, or I’m trying to take a very general term and then whittle down extraneous sections. Also, I’m usually concerned about the context the terms appear in, regardless of their ordering. You can do this in other tools, but that’s not their designed purpose. For security, we also try to be judicious about our tooling, and often write what we need ourselves.
Why Ada? #
I learn new programming languages from time to time, and had been experimenting with Ada since it’s a systems programming language with a bit of mystique around it, due to not encountering anyone who uses it. A lot of my initial work in the language was just to try to piece it together from various sources in ways that made sense. I did quite a few write-ups about the language, as well as a talk at FOSDEM 2022, not with the intent of advocacy, but just to give people the ability to understand what Ada is about.
After realizing Ada’s decent string support and built-in concurrency primitives, I decided to try making a context-based search tool, since trying to make something useful for myself would mitigate the risk of learning a lesser known language. The goal was to see if Ada was suitable for writing a tool I would use everyday.
Ada compared to other systems programming languages #
Ada prioritizes ease of understanding for the reader over terseness for the writer, due to the nature of source code to be read many times after being written over the lifetime of a project.
The syntax prefers explicitness and keywords over symbols. Variables must be explicitly typed and declared ahead of time with no implicit conversions allowed. Packages have separate specifications from their implementation bodies, and generics must be explicitly instantiated. This style provides readers with more information nearby, while also simplifying some language rules. For example, instantiated generics and functions look and are used just like their basic forms. Many constructs do what’s intuitive based on their keywords, so within weeks of starting to learn the language, much of the standard library is approachable.
The rules allow authors to describe their intent by describing elements which add compile and runtime checks. Functions described in a package specification can include pre/post conditions as affordances for usage and as checks on correctness. Numerical types can be given ranges, and other types can be given invariants or predicates to check their validity. Lightweight versions of types can be made to describe flavors of the same type with further constraints, or simply to prevent inadvertently mixing values of the types without explicit conversions. These all work to provide improved specificity of the problem domain and clarify program goals.
Despite Ada being a new language for me, when I would step away from a project for weeks at a time, the simplicity and explicitness made it easy to pick back up again.
There are a few stumbling blocks and pitfalls in the language, but overwhelmingly it exceeded all my expectations in terms of expressiveness and modernity. Though I expected an obsolete language with a lot of baggage, the foundations were well done, and decisions which made for comparatively complicated compilers in the 80s and 90s, set up orthogonal pieces that continued to work well together as the language evolved. While new versions of many languages often introduce large amounts of entropy and reduce consistency within the language, Ada 2012 did the opposite by tying many previously loose elements together via the new aspect system.
Septum Development Timeline #
Initial Work #
The core of Septum was written in a month using GNAT Community Edition without the help of Alire. A lot of the struggles I had in the first few months would now be completely smoothed over by using today’s version of Alire, from basic toolchain install, to build setup, and simple building and running of programs. Alire is built on top of the GNAT toolchain which uses the gprbuild
tool and GPR
files to direct the build. I was aware of Alire, but my goal at the time was to use simple defaults from an IDE just to take the language for a spin before I moved onto other things.
The first Septum commit happened about a month after starting to learn the language from John Barnes’ “Programming with Ada”, and in a month I was using the core of my new parallelized search utility on large multi-million line codebases.
Moving to Alire #
Most of the rest of the development of Septum from June through the end of the year was to improve the user experience. I took a break after the initial Septum development to port “Raytracing in One Weekend” to Ada and finish writing up what I’d learned about the language.
Afterward, when I ran across custom iterators in Ada in Barnes’ book, I decided to try out Alire by trying to make a recursive directory iterator in the same spirit as Rust’s incredible walkdir. Writing the dir_iterators crate helped connect me to people in the Ada community and informed me more about the modern facilities of the language, and convinced me to convert Septum to use Alire. Next to be made was the progress_indicators crate providing spinning cursors to show users an indication of the program still working.
Improving the User Experience and Platform-Specific Builds #
The core of Septum from the initial month-long write proved exceptionally helpful in my daily work. When the Ada “Crate of the Year” competition was announced, I decided to try to improve terminal support by adding coloration, tab completion and hinting.
Having tab completion, hinting and such for a command line tool had always been an aspiration of mine, but would require binding to C functions for console functions and also a platform-specific build since I work on Linux and Windows. Ada was purposely designed without a preprocessor, so platform-specific code usually requires specifying a different implementation source file at build-time.
Since Alire directs the build process, it simplifies this into a GPR file switch based on a variable.
-- trendy_terminal.gpr
Platform : Platform_Type := external ("Trendy_Terminal_Platform");
case Platform is
when "windows" => Trendy_Terminal_Sources := Trendy_Terminal_Sources & "src/windows";
when "linux" => Trendy_Terminal_Sources := Trendy_Terminal_Sources & "src/linux";
when "macos" => Trendy_Terminal_Sources := Trendy_Terminal_Sources & "src/mac";
end case;
You can set variables like this and many others from Alire as part of the build.
# alire.toml
[gpr-set-externals.'case(os)']
windows = { Trendy_Terminal_Platform = "windows" }
linux = { Trendy_Terminal_Platform = "linux" }
macos = { Trendy_Terminal_Platform = "macos" }
After writing some prototypes using the platform-specific terminal libraries in C++, writing bindings and equivalent usage in Ada went rather quickly.
Ada’s built-in libraries for interfacing to C and Import
and Convention
aspects make working with C code very simple.
GCC can auto generate bindings for you, but since I hadn’t worked with binding Ada to C before, I didn’t want to be trying to work with magically generated code.
One problem I’ve run into multiple times with C bindings, is how macros get used to hide things I need to bind. For example, even though stdin
is supposed to be a FILE*
on Mac, this is actually a macro defined to a variable with a different identifier. Windows has similar issues which cause similar linker errors if you’re not paying attention with your bindings and reading the target source.
The local flags for the termios binding get treated as an array of bits. My Ada version treats a 32-bit integer as if it were just any array of Boolean
and the code uses natural assignment rather than bit operations.
type c_lflag_t is (ISIG,
ICANON,
XCASE,
ECHO,
ECHOE,
ECHOK,
ECHONL,
NOFLSH,
TOSTOP,
ECHOCTL,
ECHOPRT,
ECHOKE,
FLUSHO,
PENDIN);
for c_lflag_t use
(ISIG => 16#0000001#,
ICANON => 16#0000002#,
XCASE => 16#0000004#,
ECHO => 16#0000010#,
ECHOE => 16#0000020#,
ECHOK => 16#0000040#,
ECHONL => 16#0000100#,
NOFLSH => 16#0000200#,
TOSTOP => 16#0000400#,
ECHOCTL => 16#0001000#,
ECHOPRT => 16#0002000#,
ECHOKE => 16#0004000#,
FLUSHO => 16#0010000#,
PENDIN => 16#0040000#
);
type Local_Flags is array (c_lflag_t) of Boolean
with Pack, Size => 32;
Here’s an example of setting a flag:
Std_Input.Settings.c_lflag (Linux.ISIG) := not Enabled;
Trendy Test #
I spent quite a bit of time in July working on a still unfinished command line argument library with type-safe commands and arguments, and the Trendy Test library for unit testing. GNATtest was not available through Alire at the time, and also requires auto-generated setup and additional tooling, so I set out to create the simplest to use unit testing framework for Ada.
The project aimed to minimize programmer effort. This means simple or automated test registration or discovery, as well as source location reporting or errors, and descriptive errors on test failures.
Trendy Test might be even simpler by utilizing the environment tasks set up of modules to directly register tests, but the current form accomplishes most of its goals. The Criterion project for C and C++ avoids registration with the beautiful approach of having the compiler write test functions into a specific program segment and then reading this list from the PE or ELF at the test program start. I researched doing this in Ada, it seems there is enough low-level control in GNAT, but it was an investment in internals I wasn’t ready to make at the time.
To get the desired usage semantics in Trendy Test, there are some creative and unorthodox uses of exceptions for control flow triggered by dynamic dispatching. These techniques ease testing for the user despite Ada lacking macros and reflection — I strongly advise against doing these sorts of things in normal code.
However, overall I was very happy with Trendy Test’s simplification of unit testing in Ada.
Library Crate Integration #
There was some trouble with pins to crates from a client project, when I was building Trendy Terminal and Trendy Test, but that has been much smoothed out. The development experience of working on crates external to your project is much easier than it had been. I keep a relative pin to the code as I’m working locally, and then once I commit and push to its repo, adjust the pinned location to github or revert it to point back to the new crate version once published in Alire.
Using crates with Alire in general is very transparent. There’s no noticeable difference when changing the location of the crate you’re pulling in from local versions, github branches, or through the Alire index. You’re bringing in packages of Ada code, so in general you just use it like any other code.
Evaluation of Alire #
There was a remarkable improvement in my development experience with Ada from April to December last year. What started with an awkward installation process, questions about licensing, confusion about how a build happens and how to integrate external libraries, has become a significantly streamlined experience.
Alire simplifies the cross-platform development experience by providing installation of the compiler toolchain, reasonable project defaults, and a consistent way to build and run your projects across platforms. It also makes creating and publishing additional libraries as crates easy, as well as using them in other projects.
Ease of Making and Using Crates #
Alire crates makes packages of other projects available. Since the concept of packages go all the way to Ada’s beginnings in the 1980s, it’s easy and natural to separate code into reusable packages to plug into an Alire crate, since namespacing and compilation boundaries fall here already. There’s a layer of GNAT project files, called “GPR files” gluing everything together, but the defaults for these generated by Alire often “just work.”
Evaluation of Ada #
Ada far surpassed my expectations of what I thought the language could do. While learning the language, for a while it seemed like around every corner was a new feature I was familiar with which hadn’t been showcased. I didn’t expect to find a language which so easily provided the low level controls I might need while also providing RAII, control of binary layout, custom allocators, built-in pre/post conditions and invariants, and easy to use multi-tasking.
The language makes it easy to pick up a little bit a time, while not punishing you for knowing everything yet. Each piece you learn builds on how you can improve solving and modeling problems without undercutting previous methods. In particular, the attribute and aspect systems expose additional control while not obscuring the intent of your code.
Despite its surface difference from C++, Ada let me leverage much of my C++ experience by operating under many of the same conceptual models. The compilation model of specifications and bodies is intimately familiar, and it seems even like a lot of the physical design advice from John Lakos’ book, “Large Scale C++ Design” applies. A lot of Ada feels like I’m just writing the higher-level intent of a C++ program, which is why I tell people that it is like a Pascal flavor of C++ and that the language is about intent.
Writing Ada programs starts out sort of slow. It seems like the opposite of other languages, where instead of slowing down due to complexity, the package system and focus on subprograms (functions) causes you to speed up in development over time, since each element you write tends to be resilient over time. Since types are not modules, the same function you wrote for a struct type works just as well if you convert it to a class (tagged type) instead. Packages focus writing your program along lines of behavior instead of only along types, which can help group things by semantics. What you end up with is a program organized by behavior in which the concepts describe the problem being solved in an approachable manner.
Summary #
Writing Septum proved to me that Ada could be learned quickly while being capable enough to handle larger workloads. After seeing how much the ecosystem improved and grew in 2021, it’ll be exciting to see how things go in 2022.