AdaCore Blog

Ada Crate of the Year: Interactive code search

Ada Crate of the Year: Interactive code search

by Paul Jarrett Guest Author

Overview #

I learned Ada in 2021 and wrote a tool I use every­day with it, the bulk of which was writ­ten with­in my first two months of the language.

Sep­tum finds neigh­bor­hoods of con­tigu­ous lines which match cer­tain con­di­tions and which don’t match oth­er con­di­tions. This is done inter­ac­tive­ly, you load your data once, and it stays in mem­o­ry while you apply and remove rules, pro­vid­ing a quick way to whit­tle down large num­bers of search results.

Since it looks for con­tigu­ous groups of lines, Sep­tum also finds terms in any order across a con­text. A side ben­e­fit of inter­ac­tiv­i­ty is speed, by keep­ing con­tents in mem­o­ry and dis­trib­ut­ing the search across all cores, Sep­tum pro­vides quick sin­gle sec­ond search­es of mil­lions of lines of code and tens of thou­sands of files. It does this with­out build­ing index­es, using cus­tom SIMD or any net­work con­nec­tions or ser­vices. It’s not that these things wouldn’t improve Sep­tum, but rather the straight­for­ward approach has been fast enough.

Back­ground #

I work with a team of senior core tech engi­neers in the video game indus­try that works on large code­bas­es. We’re expect­ed to solve code prob­lems at all lev­els, which requires search­ing through and under­stand­ing large amounts of code. Between client and third par­ty depen­den­cies, this usu­al­ly involves sev­er­al mil­lion lines of code.

There are a lot of great search tools such as sil­ver searcher and rip­grep. How­ev­er, I’m not always cer­tain of what I’m look­ing for, or I’m try­ing to take a very gen­er­al term and then whit­tle down extra­ne­ous sec­tions. Also, I’m usu­al­ly con­cerned about the con­text the terms appear in, regard­less of their order­ing. You can do this in oth­er tools, but that’s not their designed pur­pose. For secu­ri­ty, we also try to be judi­cious about our tool­ing, and often write what we need ourselves.

Why Ada? #

I learn new pro­gram­ming lan­guages from time to time, and had been exper­i­ment­ing with Ada since it’s a sys­tems pro­gram­ming lan­guage with a bit of mys­tique around it, due to not encoun­ter­ing any­one who uses it. A lot of my ini­tial work in the lan­guage was just to try to piece it togeth­er from var­i­ous sources in ways that made sense. I did quite a few write-ups about the lan­guage, as well as a talk at FOS­DEM 2022, not with the intent of advo­ca­cy, but just to give peo­ple the abil­i­ty to under­stand what Ada is about.

After real­iz­ing Ada’s decent string sup­port and built-in con­cur­ren­cy prim­i­tives, I decid­ed to try mak­ing a con­text-based search tool, since try­ing to make some­thing use­ful for myself would mit­i­gate the risk of learn­ing a less­er known lan­guage. The goal was to see if Ada was suit­able for writ­ing a tool I would use everyday.

Ada com­pared to oth­er sys­tems pro­gram­ming lan­guages #

Ada pri­or­i­tizes ease of under­stand­ing for the read­er over terse­ness for the writer, due to the nature of source code to be read many times after being writ­ten over the life­time of a project.

The syn­tax prefers explic­it­ness and key­words over sym­bols. Vari­ables must be explic­it­ly typed and declared ahead of time with no implic­it con­ver­sions allowed. Pack­ages have sep­a­rate spec­i­fi­ca­tions from their imple­men­ta­tion bod­ies, and gener­ics must be explic­it­ly instan­ti­at­ed. This style pro­vides read­ers with more infor­ma­tion near­by, while also sim­pli­fy­ing some lan­guage rules. For exam­ple, instan­ti­at­ed gener­ics and func­tions look and are used just like their basic forms. Many con­structs do what’s intu­itive based on their key­words, so with­in weeks of start­ing to learn the lan­guage, much of the stan­dard library is approachable.

The rules allow authors to describe their intent by describ­ing ele­ments which add com­pile and run­time checks. Func­tions described in a pack­age spec­i­fi­ca­tion can include pre/​post con­di­tions as affor­dances for usage and as checks on cor­rect­ness. Numer­i­cal types can be giv­en ranges, and oth­er types can be giv­en invari­ants or pred­i­cates to check their valid­i­ty. Light­weight ver­sions of types can be made to describe fla­vors of the same type with fur­ther con­straints, or sim­ply to pre­vent inad­ver­tent­ly mix­ing val­ues of the types with­out explic­it con­ver­sions. These all work to pro­vide improved speci­fici­ty of the prob­lem domain and clar­i­fy pro­gram goals.

Despite Ada being a new lan­guage for me, when I would step away from a project for weeks at a time, the sim­plic­i­ty and explic­it­ness made it easy to pick back up again.

There are a few stum­bling blocks and pit­falls in the lan­guage, but over­whelm­ing­ly it exceed­ed all my expec­ta­tions in terms of expres­sive­ness and moder­ni­ty. Though I expect­ed an obso­lete lan­guage with a lot of bag­gage, the foun­da­tions were well done, and deci­sions which made for com­par­a­tive­ly com­pli­cat­ed com­pil­ers in the 80s and 90s, set up orthog­o­nal pieces that con­tin­ued to work well togeth­er as the lan­guage evolved. While new ver­sions of many lan­guages often intro­duce large amounts of entropy and reduce con­sis­ten­cy with­in the lan­guage, Ada 2012 did the oppo­site by tying many pre­vi­ous­ly loose ele­ments togeth­er via the new aspect system.

Sep­tum Devel­op­ment Time­line #

Ini­tial Work #

The core of Sep­tum was writ­ten in a month using GNAT Com­mu­ni­ty Edi­tion with­out the help of Alire. A lot of the strug­gles I had in the first few months would now be com­plete­ly smoothed over by using today’s ver­sion of Alire, from basic tool­chain install, to build set­up, and sim­ple build­ing and run­ning of pro­grams. Alire is built on top of the GNAT tool­chain which uses the gprbuild tool and GPR files to direct the build. I was aware of Alire, but my goal at the time was to use sim­ple defaults from an IDE just to take the lan­guage for a spin before I moved onto oth­er things.

The first Sep­tum com­mit hap­pened about a month after start­ing to learn the lan­guage from John Barnes’ Pro­gram­ming with Ada”, and in a month I was using the core of my new par­al­lelized search util­i­ty on large mul­ti-mil­lion line codebases.

Mov­ing to Alire #

Most of the rest of the devel­op­ment of Sep­tum from June through the end of the year was to improve the user expe­ri­ence. I took a break after the ini­tial Sep­tum devel­op­ment to port Ray­trac­ing in One Week­end” to Ada and fin­ish writ­ing up what I’d learned about the language.

After­ward, when I ran across cus­tom iter­a­tors in Ada in Barnes’ book, I decid­ed to try out Alire by try­ing to make a recur­sive direc­to­ry iter­a­tor in the same spir­it as Rust’s incred­i­ble walkdir. Writ­ing the dir_​iterators crate helped con­nect me to peo­ple in the Ada com­mu­ni­ty and informed me more about the mod­ern facil­i­ties of the lan­guage, and con­vinced me to con­vert Sep­tum to use Alire. Next to be made was the progress_​indicators crate pro­vid­ing spin­ning cur­sors to show users an indi­ca­tion of the pro­gram still working.

Improv­ing the User Expe­ri­ence and Plat­form-Spe­cif­ic Builds #

The core of Sep­tum from the ini­tial month-long write proved excep­tion­al­ly help­ful in my dai­ly work. When the Ada Crate of the Year” com­pe­ti­tion was announced, I decid­ed to try to improve ter­mi­nal sup­port by adding col­oration, tab com­ple­tion and hinting.

Hav­ing tab com­ple­tion, hint­ing and such for a com­mand line tool had always been an aspi­ra­tion of mine, but would require bind­ing to C func­tions for con­sole func­tions and also a plat­form-spe­cif­ic build since I work on Lin­ux and Win­dows. Ada was pur­pose­ly designed with­out a pre­proces­sor, so plat­form-spe­cif­ic code usu­al­ly requires spec­i­fy­ing a dif­fer­ent imple­men­ta­tion source file at build-time. 

Since Alire directs the build process, it sim­pli­fies this into a GPR file switch based on a variable.

-- trendy_terminal.gpr
Platform : Platform_Type := external ("Trendy_Terminal_Platform");
case Platform is
    when "windows" => Trendy_Terminal_Sources := Trendy_Terminal_Sources & "src/windows";
    when "linux"   => Trendy_Terminal_Sources := Trendy_Terminal_Sources & "src/linux";
    when "macos"   => Trendy_Terminal_Sources := Trendy_Terminal_Sources & "src/mac";
end case;

You can set vari­ables like this and many oth­ers from Alire as part of the build.

# alire.toml
[gpr-set-externals.'case(os)']
windows = { Trendy_Terminal_Platform = "windows" }
linux = { Trendy_Terminal_Platform = "linux" }
macos = { Trendy_Terminal_Platform = "macos" }

After writ­ing some pro­to­types using the plat­form-spe­cif­ic ter­mi­nal libraries in C++, writ­ing bind­ings and equiv­a­lent usage in Ada went rather quickly.

Ada’s built-in libraries for inter­fac­ing to C and Import and Convention aspects make work­ing with C code very simple.
GCC can auto gen­er­ate bind­ings for you, but since I hadn’t worked with bind­ing Ada to C before, I didn’t want to be try­ing to work with mag­i­cal­ly gen­er­at­ed code.

One prob­lem I’ve run into mul­ti­ple times with C bind­ings, is how macros get used to hide things I need to bind. For exam­ple, even though stdin is sup­posed to be a FILE* on Mac, this is actu­al­ly a macro defined to a vari­able with a dif­fer­ent iden­ti­fi­er. Win­dows has sim­i­lar issues which cause sim­i­lar link­er errors if you’re not pay­ing atten­tion with your bind­ings and read­ing the tar­get source.

The local flags for the termios bind­ing get treat­ed as an array of bits. My Ada ver­sion treats a 32-bit inte­ger as if it were just any array of Boolean and the code uses nat­ur­al assign­ment rather than bit operations.

type c_lflag_t is (ISIG,
                   ICANON,
                   XCASE,
                   ECHO,
                   ECHOE,
                   ECHOK,
                   ECHONL,
                   NOFLSH,
                   TOSTOP,
                   ECHOCTL,
                   ECHOPRT,
                   ECHOKE,
                   FLUSHO,
                   PENDIN);

for c_lflag_t use
   (ISIG    => 16#0000001#,
    ICANON  => 16#0000002#,
    XCASE   => 16#0000004#,
    ECHO    => 16#0000010#,
    ECHOE   => 16#0000020#,
    ECHOK   => 16#0000040#,
    ECHONL  => 16#0000100#,
    NOFLSH  => 16#0000200#,
    TOSTOP  => 16#0000400#,
    ECHOCTL => 16#0001000#,
    ECHOPRT => 16#0002000#,
    ECHOKE  => 16#0004000#,
    FLUSHO  => 16#0010000#,
    PENDIN  => 16#0040000#
    );

type Local_Flags is array (c_lflag_t) of Boolean
    with Pack, Size => 32;

Here’s an exam­ple of set­ting a flag:

Std_Input.Settings.c_lflag (Linux.ISIG) := not Enabled;

Trendy Test #

I spent quite a bit of time in July work­ing on a still unfin­ished com­mand line argu­ment library with type-safe com­mands and argu­ments, and the Trendy Test library for unit test­ing. GNAT­test was not avail­able through Alire at the time, and also requires auto-gen­er­at­ed set­up and addi­tion­al tool­ing, so I set out to cre­ate the sim­plest to use unit test­ing frame­work for Ada.

The project aimed to min­i­mize pro­gram­mer effort. This means sim­ple or auto­mat­ed test reg­is­tra­tion or dis­cov­ery, as well as source loca­tion report­ing or errors, and descrip­tive errors on test failures.

Trendy Test might be even sim­pler by uti­liz­ing the envi­ron­ment tasks set up of mod­ules to direct­ly reg­is­ter tests, but the cur­rent form accom­plish­es most of its goals. The Cri­te­ri­on project for C and C++ avoids reg­is­tra­tion with the beau­ti­ful approach of hav­ing the com­pil­er write test func­tions into a spe­cif­ic pro­gram seg­ment and then read­ing this list from the PE or ELF at the test pro­gram start. I researched doing this in Ada, it seems there is enough low-lev­el con­trol in GNAT, but it was an invest­ment in inter­nals I wasn’t ready to make at the time.

To get the desired usage seman­tics in Trendy Test, there are some cre­ative and unortho­dox uses of excep­tions for con­trol flow trig­gered by dynam­ic dis­patch­ing. These tech­niques ease test­ing for the user despite Ada lack­ing macros and reflec­tion — I strong­ly advise against doing these sorts of things in nor­mal code.

How­ev­er, over­all I was very hap­py with Trendy Test’s sim­pli­fi­ca­tion of unit test­ing in Ada.

Library Crate Inte­gra­tion #

There was some trou­ble with pins to crates from a client project, when I was build­ing Trendy Ter­mi­nal and Trendy Test, but that has been much smoothed out. The devel­op­ment expe­ri­ence of work­ing on crates exter­nal to your project is much eas­i­er than it had been. I keep a rel­a­tive pin to the code as I’m work­ing local­ly, and then once I com­mit and push to its repo, adjust the pinned loca­tion to github or revert it to point back to the new crate ver­sion once pub­lished in Alire.

Using crates with Alire in gen­er­al is very trans­par­ent. There’s no notice­able dif­fer­ence when chang­ing the loca­tion of the crate you’re pulling in from local ver­sions, github branch­es, or through the Alire index. You’re bring­ing in pack­ages of Ada code, so in gen­er­al you just use it like any oth­er code.

Eval­u­a­tion of Alire #

There was a remark­able improve­ment in my devel­op­ment expe­ri­ence with Ada from April to Decem­ber last year. What start­ed with an awk­ward instal­la­tion process, ques­tions about licens­ing, con­fu­sion about how a build hap­pens and how to inte­grate exter­nal libraries, has become a sig­nif­i­cant­ly stream­lined experience.

Alire sim­pli­fies the cross-plat­form devel­op­ment expe­ri­ence by pro­vid­ing instal­la­tion of the com­pil­er tool­chain, rea­son­able project defaults, and a con­sis­tent way to build and run your projects across plat­forms. It also makes cre­at­ing and pub­lish­ing addi­tion­al libraries as crates easy, as well as using them in oth­er projects.

Ease of Mak­ing and Using Crates #

Alire crates makes pack­ages of oth­er projects avail­able. Since the con­cept of pack­ages go all the way to Ada’s begin­nings in the 1980s, it’s easy and nat­ur­al to sep­a­rate code into reusable pack­ages to plug into an Alire crate, since name­spacing and com­pi­la­tion bound­aries fall here already. There’s a lay­er of GNAT project files, called GPR files” glu­ing every­thing togeth­er, but the defaults for these gen­er­at­ed by Alire often just work.”

Eval­u­a­tion of Ada #

Ada far sur­passed my expec­ta­tions of what I thought the lan­guage could do. While learn­ing the lan­guage, for a while it seemed like around every cor­ner was a new fea­ture I was famil­iar with which hadn’t been show­cased. I didn’t expect to find a lan­guage which so eas­i­ly pro­vid­ed the low lev­el con­trols I might need while also pro­vid­ing RAII, con­trol of bina­ry lay­out, cus­tom allo­ca­tors, built-in pre/​post con­di­tions and invari­ants, and easy to use multi-tasking.

The lan­guage makes it easy to pick up a lit­tle bit a time, while not pun­ish­ing you for know­ing every­thing yet. Each piece you learn builds on how you can improve solv­ing and mod­el­ing prob­lems with­out under­cut­ting pre­vi­ous meth­ods. In par­tic­u­lar, the attribute and aspect sys­tems expose addi­tion­al con­trol while not obscur­ing the intent of your code.

Despite its sur­face dif­fer­ence from C++, Ada let me lever­age much of my C++ expe­ri­ence by oper­at­ing under many of the same con­cep­tu­al mod­els. The com­pi­la­tion mod­el of spec­i­fi­ca­tions and bod­ies is inti­mate­ly famil­iar, and it seems even like a lot of the phys­i­cal design advice from John Lakos’ book, Large Scale C++ Design” applies. A lot of Ada feels like I’m just writ­ing the high­er-lev­el intent of a C++ pro­gram, which is why I tell peo­ple that it is like a Pas­cal fla­vor of C++ and that the lan­guage is about intent.

Writ­ing Ada pro­grams starts out sort of slow. It seems like the oppo­site of oth­er lan­guages, where instead of slow­ing down due to com­plex­i­ty, the pack­age sys­tem and focus on sub­pro­grams (func­tions) caus­es you to speed up in devel­op­ment over time, since each ele­ment you write tends to be resilient over time. Since types are not mod­ules, the same func­tion you wrote for a struct type works just as well if you con­vert it to a class (tagged type) instead. Pack­ages focus writ­ing your pro­gram along lines of behav­ior instead of only along types, which can help group things by seman­tics. What you end up with is a pro­gram orga­nized by behav­ior in which the con­cepts describe the prob­lem being solved in an approach­able manner. 

Sum­ma­ry #

Writ­ing Sep­tum proved to me that Ada could be learned quick­ly while being capa­ble enough to han­dle larg­er work­loads. After see­ing how much the ecosys­tem improved and grew in 2021, it’ll be excit­ing to see how things go in 2022.

Posted in #Ada    #Alire    #Crate   

About Paul Jarrett

Paul is a software engineer who holds a BS in Computer Science from Virginia Tech, and an MS in Computer Science from Georgia Tech.