Debugging is costly. Theorem proving is costly. Debugging by attempting to prove a property that does not hold and then wondering about why it can’t be proven - you do not want to do that. How do I know? I have tried that too many times.

Two examples

One situation that comes to my mind is from two or three years ago, when I was working on a Coq formalization of matching logic. I wanted to export proofs out of Coq, to a Metamath-based mechanization of the logic’s proof system. To do so, I needed to be able to choose concrete names for the output format’s bound variables. I came up with several clever solutions (one of them involved the use of custom caches) and spent days and weeks trying to prove theorems about it, thinking that “when I finally prove that lemma, it will be done”. I was producing tons of useless Coq code during that time period. Finally, a senior member of our team, Daniel, advised me to take a break from those proof attempts. I am glad that I accepted his advice (I rarely do so), because there was no solution to that particular task - as I proved perhaps a year later with a simple, perhaps a paragraph long counter-example.

Another situation is related to my formal formal semantics framework - Minuska. Multiple times, I tried to implement a symbolic execution engine that would be provably correct. Inspired by the literature, I decomposed the problem to implementing a formally verified term unification algorithm and wrapping it with some ad-hoc technical details (including a transformations between two data formats). However, I got the technical details slightly wrong, and spent an awful lot of time proving that they were correct (that the transformation preserves some meaning that it did not). I even had a pen-and-paper and LaTeX proof sketches, but they did not help, because the Coq code had to be written in a different style than pen-and-paper allowed. If I just tested the transformation on a few examples, I might have realized what was wrong. Even finishing the (broken) symbolic execution and writing some end-to-end tests would likely reveal to me that something was wrong.

A solution?

Now, the embarassing part. I wanted to have property-based testing (PBT) integrated to Minuska for a long time. I knew it would save me some work. But even now, I still do not have PBT integrated there. But setting that up always felt to require too much effor upfront that I didn’t want to do right now, especially when nobody else would appreciate that (that is, nobody would pay me for that and nobody would accept a paper about it) and when I have things to do on my mind that would be appreciated by others (like the still-unfinished language-parametric separation logic).

Even simpler unit tests would have helped. But setting a proper framework for them, when I can just prove that it works inside Coq, didn’t feel very attractive. I have some end-to-end tests, though, because Minuska has parts that live outside Coq, and thus can’t be formally reasoned about.

It just needs a little bit of ABC

The trap “to prove it correct, the code just needs this obvious lemma” is similar to another trap: “to make my examples work, the code just needs this obviously correct change”. However, both traps require a different kind of ladder to get out. In the first case, one needs to get more concrete evidence that the lemma works - or that it does not. The solution is to write (or generate) more tests. In the other case, one needs to come up with an argument saying why the change is indeed correct, and why it will not break examples outside the test suite - which requires a precise specification. I tend to fall to the first trap, likely because “when you have a hammer in your hand, everything looks like a nail”.

A pure speculation

Formal methods have the reputation of requring an upfront investment that is too high to be justified (see the recent Galois blog post). This is the case especially for theorem proving, and to be fair - this post is exactly about the high cost of theorem proving. However, I am under the impression that the high cost is not only due to the formal nature of these methods. I want to tentatively suggest - and this is a pure speculation based only on my experience - that it is also the foreigness of formal methods with respect to software projects which contributes significantly to the to-be-paid-upfront costs of these methods.

In a similar spirit, someone might say that the fact that it is cheaper to develop a web backend with PHP than with C is not primarily due to C being a poorly-designed language, or a language that would be somewhat intrinsically unsitable for web development. Rather, one might argue, C is simply too foreign to the area of web development. Similarly, unit testing is too foreign for Coq - I am not aware of any libraries devoted to that.

So, what if we had a platform that would integrate example-based unit and integration testing, property-based testing, and theorem proving? A platform that would enable developers to mix-and-match different development approaches, without requiring them to switch programming languages too much? A platform for developing real, effectful software, with a debug-for-free, pay-as-you-prove cost/benefit model? A platform that is already there, almost? That is something I would like to explore in some of my next posts.

Conclusion

Do not debug programs by trying to prove them correct. And perhaps, formal methods engineers maybe have the same problem with informal testing as other engineers have with formal methods. At least, one does have.