Pointer-manipulating programs are hard to get right, in particular due to the possibility of pointer aliasing. To enforce a good discipline, programming languages such as Cyclone and Rust employ ownership types. These restrict the allowed shapes of object graphs in the memory and/or enforce encapsulation by limiting mutations that can happen through certain edges in the graph. A well-designed ownership type system ensures that “well-typed programs do not go wrong”, where the notion of “going wrong” includes accesses through dangling pointers.
However, the advantages of ownership types are not limited to program safety. In their PLDI'22 paper titled “Modular Information Flow through Ownership”, Will Crichton, Marco Patrignani, and Maneesh Agrawala demonstrate that information encoded in well-ownership-typed programs can be exploited to perform modular analyses of flows of data between program entitees. This in turn enables programming language engineers to write tools that perform information flow control (for example, ones that check that public outputs do not depend on private keys) or program slicing (which can be exploited for verification, as in the Symbiotic tool) that scale to large programs.
The paper is accompanied by Flowistry - an analyzer of Rust programs, available as a VSCode plugin.
Extending the type-checker
Typically, the main complication of data-flow analyses is in dealing with pointers. The crucial insight of the paper is that in an ownership-typed language, the type checker already has a good-enough approximation of the “points-to” set of any particular pointer. Therefore, the type checker can be augmented to perform the data-flow analysis.
On the conceptual level, both the type checking and the information flow analysis can be represented as an inference system - and they are presented that way in the paper. To make their reasoning precise, the authors based their work on Oxide - a formalization 1 of the Rust programming language as an operational semantics, a rule-based type-checker, and a proof of soundness of the type checker with respect to the operational semantics by means of progress and preservation theorems, as it is customary. In Oxide, typing judgements have the form of \( \Sigma; \Delta; \Gamma \vdash e : \tau \Rightarrow \Gamma^\prime \) 2 , meaning something like “in the context of \( \Sigma \), \( \Delta \), \( \Gamma \), the expression \( e \) has the type \( \tau \) and after its evaluation, the stack typing will be \( \Gamma \)”. This is visible in the typing rule for sequential composition
where the “output stack typing” \( \Gamma_1 \) of the first expression is used (with some modifications performed by gc-loans
to type-check the second expression.
As a side note: one reason why the output stack typing is needed is that the expression being typed may involve moving values of variables, which destroys them.
The augmented type system derives judgements of the form \( \Sigma; \Delta; \Gamma; \Theta \vdash e_l : \tau \bullet \kappa \Rightarrow \Gamma^\prime; \Theta' \); where
- \( l \) is a location marker;
- \( \kappa \) is a set of dependencies (which are locations);
- \( \Theta \) and \( \Theta^\prime \) are (input and output) dependency contexts - mappings from memory locations to sets of dependencies.
The typing rules extend Oxide’s typing rule. For example, here is a simple rule from the Oxide paper: And here is the same rule from the main paper: The authors even red-colored the additions! 3
The most interesting rule is the one for an assignment through a reference. The key here is the premise \( \Delta; \Gamma_1 \vdash_{uniq} p \Rightarrow \{ \overline{l} \} \), which is used to extract from the type system the “points-to” information of \( p \) (where \( \{ \overline{l} \} \) is a list of place expressions ). But how does the type system know where \( p \) points to? From \( \Gamma \)! Specifically, \( \Gamma \) is a list of frames, each frame being a list of elements of the shape \( x : \tau \) (saying that the identifier \( x \) is has the type \( \tau \) (possibly mentioning regions in case of reference types)) or \( r \mapsto \{ \overline{l} \} \) (saying that the region4 \( r \) has the set \( \{ \overline{l} \} \) of loans ).
How is this modular?
Let’s have a look at the rule for function call.
When calling a function \( f \), the function is retrieved from the environment of global functions \( \Sigma \).
I will not describe this rule in detail (see the paper!) and only note that the unique-loans
expression computes, intuitively, the set of place-expressions that can be mutated by following the references leading from the place \( \pi \) used as the function’s argument.
However, the function’s body, \( e \), is never used in the rest of the rule, thus the analysis is independent of it and thus modular.
Conclusion
Besides proving soundness of their analysis, the authors also implemented a tool in Rust for Rust and used it to analyze 10 Rust packages (“crates”) with sizes between 10kLoC and 55 kLoC. They also created a few other versions of their tool, including a whole-program version, and compared them.
One question I would have on the authors is: would it be possible to implement Flowistry in a non-intrusive way - that is, without a heavy instrumentation of the type checker? As both typing and dataflow could be considered abstract interpretations, maybe the two could communicate using standard tools for communication between analyses.
-
As far as I know, the formalization is pen-and-paper only - no proof assistants involved. ↩︎
-
Actually, in the Oxide paper, typing judgements have the form of \( \Sigma; \Delta; \Gamma; \Theta \vdash e : \tau \Rightarrow \Gamma^\prime \), where the additional \( \Theta \) is called “temporary typing”; this is a subtle technical detail, so the main paper elides it and so do I. Note that the Oxide’s \( \Theta \) is different from the main paper’s \( \Theta \). ↩︎
-
Sometimes there are some mild syntactical differences between the corresponding rules. ↩︎
-
Regions are also called provenances or lifetimes. ↩︎