On "Compiling With Abstract Interpretation" by Lesbre & Lemerre

Optimizing programming language compilers implement two kinds of operations (“passes”) on their intermediate language: analyses, and transformations. Passes are often interdependent across the two kinds: analyses enable some transformations, and transformations enable more analyses. As a typical compiler (think GCC or Clang) separates the two kinds, a natural question arises: how shall we order the analysis and transformation passes? A particular ordering may work well for one program, resulting in highly optimized code, while failing to optimize another program.

In their PLDI'24 paper, Dorian Lesbre and Matthieu Lemerre describe a technique for executing analyses and transformations at the same time, mutually benefiting from each other, while keeping them all nicely independent in the implementation. Their technique can optimize programs in a way GCC or Clang can’t. Are we entering a new era of simpler and more powerful compilers?

[Read More]

Things Learned at School

It is mid-August. Younger kids are preparing themselves for school (starting in September), and older kids are preparing for their university studies. I am writing my dissertation and reflecting on what I have learned during my studies.

Some say that whatever they have learned, they have learned outside school - at a company or in their life. Perhaps I have been fortunate, but as I was interleaving work and studies, I found the things I learned at university and work to complement each other. I will leave a discussion about what I have learned in companies for future work 1 and focus this post on skills I learned at school.

[Read More]

Separation-logic Style Local Reasoning for Free?

For a long time, deductive software verification faced two challenges: (1) the presence of pointers in computer programs and (2) the variety of programming languages one may want to support. The former was addressed by the family of separation logics, while the latter by a family of operational-semantics-parametric reachability logics. However, I have not seen anyone reasoning effectively about pointer-manipulating programs in reachability logic, and similarly, adapting a separation logic (e.g., Iris) to new languages requires a lot of expert manual work. Moreover, it seems to me that both communities around separation and reachability logic are mostly unaware of (or non-interacting with) the others’ work. But what if we could have the best of the two worlds? Could we have separation-logic style local reasoning for free, without repeatedly proving boring adequacy theorems?

[Read More]

Announcing Minuska

I am announcing Minuska: a truly formal programming language framework. One can use Minuska to specify programming language semantics - formally. From the semantics, Minuska can automatically derive a formally proven interpreter for the language. Minuska is formally specified, implemented, and proved using the Coq theorem prover, with a bit of OCaml used for its command-line interface.

[Read More]

Formal Verification Requires Writing The Same Code Twice?

A question:

In order to formally verify a piece of code, you need a formal specification. Then you need the code. Both the specification and the code describe the same thing - say, an algorithm. Of course, because otherwise the verification would fail. Therefore, formal verification requires describing one thing twice. How is that helpful? Especially if formal verification aims to reduce the amount of code one needs to trust…

TL;DR: If you find yourself coding a description of something twice, you are probably doing something wrong. Maybe you are writing the program, or the specification, in a wrong style. And maybe it is not your fault, because your languages do not allow you to do better.

[Read More]

Highs and Lows of Language-Parametric Formal Verification

Formal verification tools need to understand the programming language they target. For some of the tools, the understanding is represented implicitly in their code, while other are explicitly linked to a separate representation of the language. A tool in the latter category can also be created by semi-automatically deriving it from the language representation, as demonstrated by K-framework. In that case, the language is encoded explicitly in a formal semantics, the verification tool is built separately in a way that allows plugging in an arbitrary formal semantics, and then the two are combined. In this post I explore some advantages and limitations of this “language-parametric” approach.

[Read More]

Design by Contract for Normal People

In their CppCon'23 talk titled “Better Code: Contracts in C++”, Sean Parent and Dave Abrahams of Adobe advise developers on how to use a methodology known as “Design by Contract” to build better and safer systems. While the talk is targeted towards C++ developers, the methodology is fairly general and can be applied to other imperative languages as well. For me, the talk was really fun to watch, so I would definitelly recommend it to others. That said, I have a couple of remarks about the talk that I want to write down and share.

[Read More]

Bug characteristics in open source software

Recently I came across an interesting Empirical Software Engineering paper from 2013. In the paper, titled Bug characteristics in open source software, the authors reported on their studies of bugs in the Linux kernel, Mozilla (today known as Firefox), and the Apache server (all C/C++ projects at that time). They manually classified over 2000 bugs and, among other insights, found that around 2/3 of security vulnerabilities were caused by semantic bugs (which could be also labeled as “functional correctness” issues), compared to around 1/3 of vulnerabilities caused by memory bugs (buffer overflows, double frees etc). Moreover, the proportion of memory bugs was decreasing with time, as the projects were getting more mature. Does this mean that the efforts put into memory-safe languages like Rust are, so to speak, beating a dead bug?

[Read More]

On the Value of Formal Semantics

Recently, a friend of mine asked me: “Do you think that all programming languages should have a formal semantics?”. I do not want to give a simple yes/no answer to that question. Instead, I would like to think about the value formal semantics provides and compare it with the alternatives programming languages have.

[Read More]