On "Compiling With Abstract Interpretation" by Lesbre & Lemerre


Optimizing programming language compilers implement two kinds of operations (“passes”) on their intermediate language: analyses, and transformations. Passes are often interdependent across the two kinds: analyses enable some transformations, and transformations enable more analyses. As a typical compiler (think GCC or Clang) separates the two kinds, a natural question arises: how shall we order the analysis and transformation passes? A particular ordering may work well for one program, resulting in highly optimized code, while failing to optimize another program.

In their PLDI'24 paper, Dorian Lesbre and Matthieu Lemerre describe a technique for executing analyses and transformations at the same time, mutually benefiting from each other, while keeping them all nicely independent in the implementation. Their technique can optimize programs in a way GCC or Clang can’t. Are we entering a new era of simpler and more powerful compilers?

Consider the following C program (taken directly from the paper).

#define F(x) ((x) * (x) + 1)
void example(int rand) {
    int i = 0;
    int j = 0;
    int z = 4;
    while(z < 125) {
        // Invariant: i = j /\ z%2 == 0
        j = F(j + z % 2);
        if (F(i) == j && z%2 == 0) {
            z += 2;
        } else { // Dead code
            i = rand;
            z += 7;
        }
        i = F(i);
    }
}

The authors say that to prove the loop invariant, one needs to learn that z is even (using numerical analysis), learn that F(j + z%2) is F(j) and replace the former with the latter, learn that i = j (using global value numbering), and eliminate the dead code (so that no analysis considers the else branch which breaks all the properties). All these have to be learned/performed in one step: for example, if we tried to only learn that z is even, the else branch would break the property.

The main idea of the paper is that compiler transformations can communicate with other analyses using the standard framework of abstract interpretation. This idea stands on two crucial observations:

  1. one can design an abstract domain FA that operates on representation of the programming language (abstract) syntax; and
  2. some program transformations can be encoded as functors transforming abstract domains (transformation functors). Then, every abstract value of a functor chain (consisting of “reasonable” functors, including transformation functors) that uses FA contains enough information to reconstruct some program; those abstract values that are results of an analysis performed with such a functor chain then embed programs closely related to the analyzed program (that is, to the program embedded in the initial abstract value).

Here is a drawing (page 3) of the functor chain used as an illustrative example throughout the paper. The functor chain used in the paper The x is a product functor; Q simplifies queries to abstract domain; T and DG are simple transformation functors. I will not describe the chain in detail; see the paper if interested.

Technically, the FA domain is a free algebra over the algebraic signature of abstract domains. In the upper branch of the chain, the input language of the analyzer is analyzed, while in the lower branch, a static single assignment (SSA) language is used. The Lift functor implements a translation from the input language to SSA.

Besides developing the theory, the authors also implement the technique in a real-world program analyzer, as well as in a simplified toy one. In the conclusion, the authors write:

In future work, it would be interesting to see if lifting functor domains to compiler passes can be an effective method to design compilers…

I am looking forward to it.