In this post I sketch out how to build a program logic that has a locality property (as separation logic has, represented by the Framing rule) but is parametric in an operational model of a programming language (as reachability logic is). The post is a follow-up on my recent Separation-logic Style Local Reasoning for Free?, which also discusses some motivation for this line of work. It is still a bit sketchy and is missing formal proofs, but I hope that it sheds more light on the technical side of this topic.
Basic (Minuska-style) term-rewriting language definitions
This section is the boring one. Here I review the logical basis of Minuska.
In short, we have:
- primitive values \( V = \{ v, v_1, v_2, v_3, ..., v_n, ..., v_{xyz} \} \) ;
- symbols \( S = \{ s, s_1, s_2, s_3, ..., s_n, ..., s_{xyz} \} \)
- terms (symbols applied to lists of terms) \( T = \{ t, t_1, t_2, t_3, ..., t_n, ..., t_{xyz} \} \) ;
- patterns (variables, and symbols applied to list of patterns) \( P = \{ p, p_1, p_2, p_3, ..., p_n, ..., p_{xyz} \} \) ;
- function-symbols \( F = \{ f, f_1, f_2, f_3, ..., f_n, ..., f_{xyz} \} \) ;
- expressions \( E = \{ e, e_1, e_2, e_3, ..., e_n, ..., e_{xyz} \} \) ;
- expression-terms \( ET = \{ et, et_1, et_2, et_3, ..., et_n, ..., et_{xyz} \} \) ;
- and valuations \( \Delta = \{ \delta, \delta_1, \delta_2, \delta_3, ..., \delta_n, ..., \delta_{xyz} \} \)
Patterns and Expression-terms match terms in valuations, written \( t, \delta \vdash p \) and \( t, \delta \vdash et \).
Rules have the form of l => r if c
, where l
is a pattern, r
an expression-term, and c
a condition.
Conditions are interesting and I discuss them in the next section.
If you already know Minuska, or K-framework, or something similar, feel free to skip over the Details section and only return to the individual definitions when you need to.
Details
Assume a countable (possibly infinite, but finite is fine too) set \( V \) of primitive values. Typically, this set may contain integers (…,-2,-1,0,1,2,…), booleans (true, false), lists or finite maps over other items, … - but does not have to. Whenever I write a meta-variable whose name resembles \( v, v_1, v_6, v_{xyz} \) and do not specify its type, I mean it to be a value.
I will be considering what structure to give to \(V \) - that is, which operations on \( V \) shall be allowed.
Assume a countably infinite set \( S \) of symbols. I will work with terms built from symbols and primitive values, and with patterns built from variables, symbols, and primitive values.
The set \( T \) (of terms) is defined as the smallest set \( T \) such that
- \( v \in V \) implies \( v \in T \) (a value is a term),
- \( s \in S \) and \( t_1,...,t_n \in T \) implies \( s[t_1,...,t_n] \in T \) (a symbol applied to terms is a term).
The set \( P \) (of patterns) is defined as the smallest set \( P \) such that
- \( x \in X \) implies \( x \in P \) (a variable is a pattern)
- \( v \in V \) implies \( v \in P \) (a value is a pattern),
- \( s \in S \) and \( p_1,...,p_n \in P \) implies \( s[p_1,...,p_n] \in P \) (a symbol applied to patterns is a pattern).
Patterns can match terms when considering a particular valuation. A valuation \( \delta \) is a finite map from variables to terms; the set of all valuations is denoted by \( \Delta \). A pattern matches a term in a valuation whenever the pattern, after replacing variables with corresponding terms, becomes the term.
The pattern matching relation \( \_,\_ \vDash_p \_ \subseteq T \times \Delta \times P \) is defined as the smallest relation such that
- \( v, \delta \vDash_p v \) ;
- \( t, (\{ (x, t) \} \cup \delta) \vDash_p x \) ; and
- \( t_i, \delta \vDash_p p_i \) for all \( i \in \{ 1,...,n \} \) implies \( s[t_1,...,t_n], \delta \vDash_p s[p_1,...,p_n] \)
I will often write simply \( \vDash \) to mean \( \vDash_p \). Now, I want to be able to model programming language semantics using rewrite rules of the shape
p_l => p_r when c_1,...,c_k
where p_l
, p_r
are some kind of patterns and c_1
,…,c_k
conditions. I will talk about conditions later; for now let me focus on the kind-of patterns on the right-hand side.
I want to be able to perform operations on primitive values. For example, I want be able to invoke integer multiplication when defining how the multiplication operator behaves in a particular programming language. Therefore, let us assume that we have some algebraic signature representing function symbols that we can use on the right-hand side, and an algebra of operations over ground terms. The following two definitions are standard.
Right-hand sides of rewrite rules will be expression terms, which are like terms with additional constructor that takes an expression.
Given an algebraic signature \( \mathcal{F} = (F, a_F) \), the set \( E \) of expressions is defined as the smallest set \( E \) such that
- \( x \in E \) ;
- \( v \in E \) ; and
- \( f \in F \) and \( e_1,...,e_n \in E \) and \( n = a_{F}(f) \) implies \( f(e_1,...,e_n) \in E \) .
Given a valuation \( \delta \) and an expression \( e \) such that all variables of \( e \) are in the domain of \( \delta \), the value of \( e \) in \( \delta \), written \( \delta(e) \), is extended beyond variables by
- \( \delta(v) = v \); and
- \( \delta(f(e_1,...,e_n)) = r_{f}(\delta(e_1),...,\delta(e_n)) \).
Given an algebraic signature \( \mathcal{F} = (F, a_F) \), the set \( ET \) of expression-terms is defined as the smallest set \( ET \) such that
- \( e \in ET \) (an expression is an expression-term) ;
- \( s \in S \) and \( et_1,...,et_n \in ET \) implies \( s[et_1,...,et_n] \in ET \) (a symbol applied to expression terms is an expression term).
The semantics of expression-terms with respect to terms is given by the following relation.
The expression-term matching relation \( \_,\_ \vDash_{et} \_ \subseteq T \times \Delta \times ET \) is defined as the smallest relation such that
- \( \delta(e), \delta \vDash_{et} e \) ;
- \( t_i, \delta \vDash_{et} et_i \) for all \( i \in \{ 1,...,n \} \) implies \( s[t_1,...,t_n], \delta \vDash_{et} s[et_1,...,et_n] \)
Reachability-Logic-Style conditions
Recall that I want rules to have the shape of l => r if c
, or more generally l => r if c_1,...,c_k
,
where l
is a pattern, r
an expression term, and c,c_1,...,c_k
are conditions. The general idea is that there is yet anoter \( \vDash \) relation,
namely \( \vDash_c \), between valuations and conditions; then, a one-step relation \( \rightsquigarrow_R \) is defined for every language definition (a finite set of rules) to be the binary relation on terms defined by \( t \rightsquigarrow_R t' \) iff there exist a rule l => r if c_1,...,c_k
in \( R \) and a valuation \( \delta \) such that
- \( t, \delta \vDash l \) ;
- \( t^\prime, \delta \vDash r \); and
- \( \delta \vDash c_i \) for every \( i \in \{ 1,...,k \} \).
I haven’t talked about conditions yet, so let me discuss them now. The classic reachability-logic-style way of implementing conditions is to let them be quantifier-free formulas of first-order logic with equality. A similar schema is implemented in the K-framework, and a simplified one in Minuska. Technically, to implement this idea, primitive values are drawn from the same algebra that is used by expression-terms on the right-hand side of rules, and this algebra would have \( T \) as its carrier. However, as I will show, this is too powerful and makes the Framing rule unsound.
Let me also recall that in separation logic, we write triples of the shape \( \{ P \} C \{ Q \} \); such a triple is considered valid iff it is safe to execute the command \( C \) in a state satisfying the formula \( P \) and whenever we do so, any resulting state after \( C \) terminates (if it terminates at all) satisfies the formula \( Q \). We also have the \( * \) connective to combine formulas. This connective maps directly to the join operation of the underlying separation algebra. The following definition is by Dockins (2009), inspired by Calcagno (2007).
A (Multi-Unit) Separation Algebra is a pair \( \mathcal{A} = (A, j) \) of a set \( A \) and a relation \( j \subseteq A \times A \times A \) (called join) satisfying the following axioms (we write \( x + y = z \) to mean \( j(x,y,z) \) ):
- \( x + y = z_1 \) and \( x + y = z_2 \) implies \( z_1 = z_2 \) (functionality)
- \( x_1 + y = z \) and \( x_2 + y = z \) implies \( x_1 = x_2 \) (cancellativity)
- \( x + y = z \) implies \( y + x = z \) (commutativity)
- \( x + y = a \) and \( a + z = b \) implies existence of \( c \in A \) satisfying \( y + z = c \) and \( x + c = b \) (associativity)
- for every \( x \in A \) there exists some \( u_x \in A \) such that \( x + u_x = x \) (unit)
Furthermore, let us have a separateness and a subvalue relations defined by
- \( a \# b \) iff \( a + b = c \) for some \( c \)
- \( a \leq c \) iff \( a + b = c \) for some \( b \)
The \( * \) connective is then defined using the join relation:
\[ h \vDash P * R \iff \exists h_1, h_2.\, h_1 + h_2 = h \land h_1 \vDash P \land h_2 \vDash R \]Crucially, the following proof rule (perhaps with some side conditions) holds in separation logic:
{ P } C { Q }
--------------------
{ P * R } C { Q * R }
For this all to make sense, the program state has to be a separation algebra.
In reachability logic, one expresses a similar idea but with commands and states combined to form configurations. A reachability claim has the shape \( l \Rightarrow^{\forall} r \textit{ if } c_1,...,c_k \) where \( l \) is a pattern, \( r \) an expression term, and \( c_1,...,c_k \) conditions, and it is considered valid under a given language definition \( R \), written \( R \vDash l \Rightarrow^{\forall} r \textit{ if } c_1,...,c_k \), iff for every \( t, t^\prime,\delta \) such that \( t \rightsquigarrow_R t^\prime \) and \( t,\delta \vDash l \) and \( \delta \vDash c_i \) for every \( c_i \in \{ 1,...,k \} \) we have that \( t^\prime, \delta \vDash r \). But how do we translate the concept of separation into the language of reachability logic?
A straightforward idea is to assume a separation algebra over primitive values and extend it to terms, and also extend patterns and expression terms to support a new connective - \( * \).
Assume a separation algebra \( (V, j_V) \) over the set \( V \) of primitive values. The induced term separation algebra \( (T, j_T) \) over the set \( T \) of terms is defined by defining \( j_T \) inductively, as the smallest relation satisfying:
- \( j_V(v_1,v_2,v_3) \) implies \( j_T(v_1,v_2,v_3) \)
- \( s \in S \) and \( j_T(t_i, t'_i, t''_i) \) for all \( i \in \{ 1,...,k \} \) implies \( j_T(s[t_1,...,t_k], s[t'_1,...,t'_k], s[t''_1,...,t''_k]) \)
This way, two terms are compatible iff they have the same term-structure and values on corresponding positions are compatible.
The set \( P_s \) (of separation patterns) is defined as the smallest set \( P_s \) such that
- \( x \in X \) implies \( x \in P_s \);
- \( v \in V \) implies \( v \in P_s \);
- \( s \in S \) and \( p_1,...,p_n \in P_s \) implies \( s[p_1,...,p_n] \in P_s \); and
- \( p_1 \in P_s \) and \( p_2 \in P_s \) implies \( p_1 * p_2 \in P_s \).
(13) Separation expression term.
Given an algebraic signature \( \mathcal{F} = (F, a_F) \), the set \( ET_s \) of separating expression-terms is defined as the smallest set \( ET_s \) such that
- \( e \in ET_s \)
- \( s \in S \) and \( et_1,...,et_n \in ET_s \) implies \( s[et_1,...,et_n] \in ET_s \); and
- \( et_1 \in ET_s \) and \( et_2 \in ET_s \) implies \( et_1 * et_2 \in ET_s \).
(14) Separation pattern matching.
The separation pattern matching relation \( \_,\_ \vDash_{sp} \_ \subseteq T \times \Delta \times P_s \) is defined as the smallest relation such that
- \( v, \delta \vDash_{sp} v \) ;
- \( t, (\{ (x, t) \} \cup \delta) \vDash_{sp} x \) ;
- \( t_i, \delta \vDash_{sp} p_i \) for all \( i \in \{ 1,...,n \} \) implies \( s[t_1,...,t_n], \delta \vDash_{sp} s[p_1,...,p_n] \); and
- \( t_1, \delta \vDash_{sp} p_1 \) and \( t_2, \delta \vDash_{sp} p_2 \) and \( j_T(t_1, t_2, t) \) implies \( t, \delta \vDash_{sp} p_1 * p_2 \)
(15) Separation expression-term matching.
The expression-term matching relation \( \_,\_ \vDash_{set} \_ \subseteq T \times \Delta \times ET_s \) is defined as the smallest relation such that
- \( \delta(e), \delta \vDash_{set} e \) ;
- \( t_i, \delta \vDash_{set} et_i \) for all \( i \in \{ 1,...,n \} \) implies \( s[t_1,...,t_n], \delta \vDash_{set} s[et_1,...,et_n] \); and
- \( t_1, \delta \vDash_{set} et_1 \) and \( t_2, \delta \vDash_{set} et_2 \) and \( j_T(t_1, t_2, t) \) implies \( t, \delta \vDash_{set} et_1 * et_2 \).
This way, we can state the framing rule as
R ⊨ l =>^∀ r if c_1,...,c_k
-------------------------------
R ⊨ l * f =>^∀ r * f if c_1,...,c_k
because the frame \( f \) is a (separating) pattern, and (separating) patterns are also (separating) expression-terms.
However, it does not hold.
As I have written in my previous post on this topic, the framing property breaks in cases where a rewriting rule uses equality in an ugly way, or when it uses certain functions over primitive values (boolean negation, comparison to false
, or just kind-of ’negative’ queries). A solution is to disallow direct comparison of values in side condition and allow only queries that are explicitly allowed by the underlying structure over primitive values.
Fixing the Framing
Thus, the underlying structure will have explicit predicates. FOL structures are good candidates for that.
(18) Nice separating FO structure.
A separating FO structure over a FO signature \( \Sigma = (F, Q, a_F, a_Q) \) is a tuple
\[ \mathfrak{S} = (V, (r_x)_{x \in F \cup Q}, j ) \]such that
- \( (V, j) \) is a separation algebra; and
- \( (T, (r_x)_{x \in F \cup Q}) \) is a \( \Sigma \)-FOL structure (where \( T \) is the set of terms over primitive values \( V \) ).
We additionally say that \( \mathfrak{S} \) is nice, if
- \( t_i \leq t'_i \) for all \( i \in \{ 1,...,k \} \) and \( k = a_Q(q) \) and \( (t_1,...,t_k) \in r_q \) implies \( (t'_1,...,t'_k) \in r_q \) (where the \( \leq \) operator is from the induced separation algebra on terms over primitive values \( V \)) (predicate monotonicity); and
- \( j(t_i, t''_i, t'_i) \) for all \( i \in \{ 1,...,k \} \) and \( k = a_F(f) \) implies \( j(r_f(t_1,...,t_k), r_f(t''_1,...,t''_k), r_f(t'_1,...,t'_k)) \) (operation niceness).
Now, let me consider rules of the shape l => r if c_1,...,c_k
where each \( c_i \) is a predicate applied to an appropriate number of patterns such that variables occuring in the condition occur also in \( l \).
The following theorem holds, and from that the soundness of the Framing rule follows.
A Question of expressivity
Let me give an example of such nice separating FO structure that may be used in a practical language definition.
(20) Example separating FO structure.
- Let \( F = \{ \textit{plus}, \textit{uminus}, \textit{zero}, \textit{one}, \textit{emptymap}, \textit{get}, \textit{update}, \textit{erase} \} \), and \( a_{F} : F \to \mathbb{N}_0 \) be defined by
- \( a_{F}(\textit{plus}) = 2 \);
- \( a_{F}(\textit{uminus}) = 1 \);
- \( a_{F}(\textit{zero}) = 0 \);
- \( a_{F}(\textit{one}) = 0 \);
- \( a_{F}(\textit{emptymap}) = 0 \);
- \( a_{F}(\textit{get}) = 2 \);
- \( a_{F}(\textit{update}) = 3 \);
- \( a_{F}(\textit{erase}) = 2 \).
- Let \( Q = \{ \textit{eqint}, \textit{ltint}, \textit{has} \} \), and \( a_{Q} : Q \to \mathbb{N}_0 \) be defined by
- \( a_{Q}(\textit{eqint}) = 2 \);
- \( a_{Q}(\textit{ltint}) = 2 \);
- \( a_{Q}(\textit{has}) = 2 \);
- Let \( \Sigma = (F, Q, a_{F}, a_{Q}) \) be a FO signature.
- Let \( V_{map} \) denote the set of finite maps on integers; each finite map is a finite set of integer pairs. Let \( V = \{ (0,0) \} \{ (1, z) \mid z \in \mathbb{Z} \} \cup \{ (2, m) \mid m \in V_{map} \} \).
- Let
- \( r_{\textit{plus}}((1,x), (1,y)) = (1,x+y) \) for all \( x, y \in \mathbb{Z} \), and for all other \( u, v \), let \( r_{\textit{plus}}(u, v) = (0,0) \);
- \( r_{\textit{uminus}}((1,x)) = (1,-x) \), and for all other \( u \), let \( r_{\textit{uminus}}(u) = (0,0) \);
- \( r_{\textit{zero}}() = (1,0) \);
- \( r_{\textit{one}}() = (1,1) \);
- \( r_{\textit{emptymap}}() = (2, \emptyset) \);
- \( r_{\textit{get}}((2, m), (1, k)) = (1, v) \) whenever \( (k,v) \in m \), and for all other \( u, v \), let \( r_{\textit{get}}(u,v) = (0,0) \);
- \( r_{\textit{update}}((2, m), (1, k), (1, v)) = (2, \{ (k,v) \} \cup \{ (k',v') \mid (k',v') \in m \land k' \not= k \} ) \) if \( (k,v'') \in m \) for some \( v'' \), and for all other \( u, v, w \), let \( r_{\textit{update}}(u,v,w) = (0,0) \);
- \( r_{\textit{erase}}((2, m), (1, k)) = (2, \{ (k',v') \mid (k',v') \in m \land k' \not= k \} ) \);
- \( r_{\textit{eqint}} = \{ ((1,x),(1,x)) \mid x \in \mathbb{Z} \} \);
- \( r_{\textit{ltint}} = \{ ((1,x),(1,y)) \mid x,y \in \mathbb{Z} \land x < y \} \);
- \( r_{\textit{has}} = \{ ((2,m),(1,k)) \mid m \in V_{map} \land k \in \mathbb{Z} \} \); and
- Let \( j \) be defined to be the smallest relation such that
- for every \( u \), \( (u,u,u) \in j \);
- for every \( u \), \( ((0,0),u,u) \in j \);
- for every \( u \), \( (u,(0,0),u) \in j \); and
- for every \( m_1, m_2 \in V_{map} \) such that \( m_1 \cap m_2 = \emptyset \), \( ((2, m_1), (2, m_2), (2, m_1 \cup m_2)) \in j \).
- Let \( \mathfrak{S}_{ex} = (V, (r_{x})_{x \in F \cup Q}, j) \).
With such a nice separating FO structure we can give rewriting rules that work with numbers, read from heap, write to heap, and erase from heap. But it can’t allocate a new heap element: \( \textit{update} \) check whether the key is already in the map. If it would not do that, the structure would not be nice: for example, let \( m = \{ (100, 23) \} \), \( m_1 = \emptyset \), \( m_2 = \{ (100, 23) \} \), and \( k = 100 \). Then \( (m_1,m_2,m) \in j \) and \( \textit{update}((2,m_1),(1,k), (2,24)) = \{ (100, 24) \} \) which is certainly not compatible with \( m_2 \).
We can fix this with nondeterminism/underspecification, as I have suggested in the previous post.
Oracles to the rescue
The idea is that the operations of the structure will be given one more parameter, which will be a value chosen non-deterministically in each step of execution. That will allow the execution to branch, and thus the semantics be underspecified. Maybe “oracle” is not the right word for this use-case, but for now it will go.
Let me do a similar example as before, but this time we will have oracles, which are simple functions that generate a fresh element with respect to a particular set (of integers). I will also add a new unary operation, \( \mathit{alloc} \), that insert a new value into a map, taking responsibility for not-overwriting existing values, and a new nullary operation, \( \mathit{keyofalloc} \), that returns the key of such newly allocated position in the map.
(24) Example oracled separating FO structure.
Let \( F, Q, a_{F}, a_{Q} \) be as in the previous definition, and let \( O = \{ f \in \mathbb{Z}^{\mathcal{P}_{fin}(\mathbb{Z})} \mid \forall z, x.\ z \in x \implies z \not\in f(x) \} \). Let \( \Sigma' = (F \cup \{ \mathit{alloc}, \mathit{keyofalloc} \}, Q, a_{F} \cup \{ (\mathit{alloc}, 1), (\mathit{keyofalloc}, 0) \}, a_{Q}, O) \). Let
- \( R(o)_{\mathit{alloc}}((2, m), (1, v)) = (2, m \cup \{ (o(\{ k' \mid (k',v') \in m \}), v) \} ) \), and for all other \( u, v \), let \( R(o)_{\mathit{alloc}}(u,v) = (0,0) \);
- \( R(o)_{\mathit{keyofalloc}}((2, m)) = (2, o(\{ k' \mid (k',v') \in m \})) \), and for all other \( u \), let \( R(o)_{\mathit{keyofalloc}} = (0,0) \); and
- \( R(o)_f = r_f \) for all other \( f \in F \);
- \( R(o)_p = r_p \) for all \( p \in P \);
- \( \mathfrak{S}'_{ex} = (V, R) \).
We can now write rules like this one:
(x := alloc(z), stack, heap) => (skip, stack[x := keyofalloc()], alloc(heap, z))
while preserving framing. (Of course, we need to manage the stack - mapping of local variables to integers, but that is not a problem with Minuska or K.) Yeah, and I forgot to say that the semantics of \( \rightsquigarrow \) is now that a step is possible if there exists such an oracle value that all things are in the right relations, and we need to propagate this oracle value through all the inductive relations. That is an easy thing to do, and I will not do it in this post because anyone can imagine it.
Conclusion
Besides Framing, I am fairly certain that we can reuse the reachability logic infrastructure - proof system, symbolic execution engine (one is being developed in Minuska), and so on. I just wanted to nail down the theory. Calcagno’s 2003 paper was super inspiring, and the definition of separation algebra is taken straight from Dockin’s 2009 paper. If you know some separation logic and are interested in this idea, please let me know - we can co-write a paper about it.