Formal Verification Requires Writing The Same Code Twice?


A question:

In order to formally verify a piece of code, you need a formal specification. Then you need the code. Both the specification and the code describe the same thing - say, an algorithm. Of course, because otherwise the verification would fail. Therefore, formal verification requires describing one thing twice. How is that helpful? Especially if formal verification aims to reduce the amount of code one needs to trust…

TL;DR: If you find yourself coding a description of something twice, you are probably doing something wrong. Maybe you are writing the program, or the specification, in a wrong style. And maybe it is not your fault, because your languages do not allow you to do better.

I have heard this multiple times. There is even a StackOverflow question on this topic, with no answers. And I myself have had the same question in the past. Why?

(Of course, the question applies only to methods which require an actual specification.)

Specification versus implementation

Specification and implementation, or some mathematical object and its properties - these are different things. Usually. Generally, a specification says what we want to achieve, and implementation says how do we get there. For example, a specification for a “sort” function should say that the output is a permutation of the input, and that the output is sorted. It should not try to explain how this is to be achieved. An implementation, on the other hand, has to describe precisely what a machine should do to achieve the required result. (And a proof would say how the two are linked.) Notably, there might be multiple different algorithms (bubble sort, quicksort, …) that achieve the goal.

For example, in Coq’s stdpp library, there is an implementation of merge sort that has about 30 lines. And then there are two theorems about the correctness of the algorithm (namely sortedness and permutation) whose statements are just a few characters:

Lemma Sorted_merge_sort `{!Total R} l : Sorted R (merge_sort R l).
Lemma merge_sort_Permutation l : merge_sort R l  l.

Note that the Sorted predicate is totally independent of the sorting algorithm: it is even defined in a more basic standard library that gets shipped with Coq, and can be used without thinking about the particular algorithm (from the stdpp library).

Somtimes this distinction blurs out: typically, when one implementation becomes a specification for another implementation, or when we have multiple specifications connected in an abstraction/refinement relationship. But then, the artifacts involved typically implement different ideas, different algorithms, or different optimizations of the algorithms. At other times, the distinction might get blurred when using inappropriate, simplistic examples.

How do people come up with this idea?

I am under the impression that people come up with the idea of “writing the same thing twice” because of oversimplified introductory materials about formal verification. As a random example, consider this post by stackbuilders. The post is generally good and gives a nice overview of the area of formal verification; it is just that the example used is exactly the case of almost duplicating specification and implementation. It features a function toUppercase

function toUppercase(name: String) -> String {
    for (letter in name) do
        uppercase(letter);
    return name;
}

with the following specification:

function toUppercase(name: String) -> result: String {
    satisfies:
        - result.length == name.length
        - forall (i <- 0..result.length-1). result[i] == uppercase(name[i])
}

So what is the problem with this?

First, from the specification it looks like uppercase is a pure function (otherwise it could not be used in the specification) returning a new character object (so that we can compare it with the result), but in the implementation of toUppercase it is used for mutating the reference it is given (assuming the for loop has a referential semantics). So in order to proceed, we need to fix the implementation:

function toUppercase(name: String) -> String {
    for (letter in name) do
        letter = uppercase(letter);
    return name;
}

We also need to assume that toUppercase takes the string by value, otherwise it would modify the original object and the specification would not hold.

Now, when we got this technical problem out of our way, we can compare the specification and implementation to find the duplication. And really:

  • there is a call to uppercase in both snippets, and the call is with the same value (although it looks syntactically different);
  • there is a ranged-for-loop in the implementation, corresponding to a bounded universal quantification in the specification;
  • there is the check on lenghts in the specification, which is something the ranged-for-loop does automatically in the implementation; and
  • there is the assignment operator inside the loop in the implementation, corresponding to the equality comparison inside the quantified formula in the specification.

So here we have the duplication. Thinking about it, the implementation is just a verbose, imperative way of writing something that could be described like map uppercase in functional syntax (think Haskell, or Coq).

And the specification is a trivial consequence of the specification of the map function (written as the infix operator <$> in Coq + stdpp).

list_lookup_fmap:
forall {A B : Type} (f : A -> B) (l : list A) (i : nat),  (f <$> l) !! i = f <$> l !! i

Therefore, this example is inapropriate for showcasing the value of a formal specification a verification, and might be a source of the confusion similar to that one behind our question.

We do not have to duplicate the code.

When we structure our program propertly, we do not have to write code that duplicates specification - or vice versa. Often, we can use a proper functional abstraction, such as higher-order functions like map in the map uppercase example from the previous section. These typically live in the standard library of any reasonable programming language.

What if I am in C++?

In C++, there exists the Ranges library that allows expressing things like map uppercase in an idiomatic C++ style. In particular, I am looking at std::views::transform. That way one can write compact, loop-free code that expresses one’s intent and does not require much thinking to read (although it might require some thinking to write). But note that you cannot formally verify C++ code anyway, since the language does not have a formal semantics.

What if I am writing a highly-performant C?

You can generate C from a functional code in a proof assistant, like people in the hacl-star project did in their recent Modularity, Code Specialization, and Zero-Cost Abstractions for Program Verification paper.

Conclusion

One does not need to duplicate ideas expressed in code in order to do formal verification. On the contrary: one should not duplicate ideas if one wants to take advantage of existing components. And using proper abstraction, we can get code that is much closer to our intent even without formal verification.