What is so surprising about cross-language attacks?


“Cross-Language Attacks” is the title of a 2022 paper published at Usenix Network and Distributed System Security Symposium - an A-ranked conference (according to ERA). The paper’s abstract says:

[…] we illustrate that the incompatible set of assumptions made in various languages enables attacks that are not possible in each language alone.

But why would anyone think otherwise?

On incompatible assumptions

Consider this C program:

#include <stdlib.h>
#include <stdio.h>
int * goo() { int *p = (int *)malloc(sizeof(int)); *p = 4; return p; }
int foo(int * x) { return *x; }
void bar(int *x) { free(x); }
int main() {
    int * p = goo();
    printf("%d\n", foo(p));
    bar(p);
    return 0;
}

This seems fine, because the assumptions and guarantees of goo, foo, and bar, are compatible with respect to the composition done by main. However, if we did instead

int main() {
    int * p = goo();
    bar(p);
    printf("%d\n", foo(p));
    return 0;
}

we would get an use-after-free problem - the assumption of foo (that its argument points to a valid memory containing an integer) is not met. One can say that it is the incompatible set of assumptions made in various parts of the application that is at fault here.

Suppose we write the main function in Rust instead. The program would still be broken, and we would now have the “incompatible set of assumptions” made in “various languages”. But the “various languages” part is completely incidental, as previously we had the assumptions made in a single language and the problem already existed. The “incompatible assumptions” are the problem.

What about counter-measures?

The abstract also says:

Memory corruption attacks against unsafe programming languages like C/C++ have been a major threat to computer systems for multiple decades. Various sanitizers and runtime exploit mitigation techniques have been shown to only provide partial protection at best. […] We show that because language safety checks in safe languages and exploit mitigation techniques applied to unsafe languages (e.g., Control-Flow Integrity) break different stages of an exploit to prevent control hijacking attacks, an attacker can carefully maneuver between the languages to mount a successful attack.

In other words: we have a counter-measure that may not help, and we make the situation more complicated by adding a new language to the application, and the countemeasure still may not help. Again, this is not surprising. We would need to have a really good reason to think otherwise.

Concrete examples from the paper

The paper features a number of illustrations. In Figure 6, a Rust code calls the following C function:

void vuln_fn(int64_t array_ptr_addr) {
    int64_t array_index = 3;
    int64_t array_value = get_attack();
    int64_t* a = (void *)array_ptr_addr;
    a[array_index] = array_value;
}

The function takes an integer, interprets it as a pointer, and writes through that pointer. However, in C, one is allowed to convert integers to pointers only in very specific circumstances - when the integer itself was a result of a pointer-to-integer conversion. Calling the function with arbitrary values that are not results of pointer-to-integer conversion leads to an undefined behavior.

But what if you give it a Rust pointer?

I think this may be the source of confusion. Just because Rust and C call something a pointer does not imply that they mean the same thing by the term.

But doesn’t that mean that in principle, values (of any type) in Rust and C are incompatible? How can you then transfer data between Rust and C at all?

Well, yes, it means that. If one wants to bridge the gap between two languages, one needs a well-specified interface whose assumptions on the caller are expressed using the terms of the language of the caller, and guarantees for the callee are given in terms of the language of the callee. This is usually called “FFI”.

Here is a point. Yes, when working in multiple languages, it is easy to make invalid assumptions about “the other side”. But the same thing is true when working in a single language using libraries through their interfaces.

Attacks and threat models?

Section 3 of the paper describes something very confusing to me.

Attacks thus have four essential phases: i) memory corruption, ii) gadget injection, iii) control-flow assumption, and iv) weird machine execution. To stop an attack, it is sufficient for a defender to disrupt any of these steps, though in practice defenses have focused on steps i and iii.

The paper then considers a protection against control-flow hijack - that is, step iii. But what does it mean to protect step iii without protecting step i? What is program running with a corrupt memory supposed to do? The paper basically says, “if we have a protection against control-flow hijack (but not against memory corruption) in one language, and we add another language into the mix, something bad may happen”. That is true. But the paper, as I read it, seems to imply that the “something bad may happen” part is the fault of adding another language to the mix. I would offer a different reading: the statement is true because the “we have a protection against control-flow hijack (but not against memroy corruption)” part is false. The paper itself says that the control-flow-integrity protection does not often work well-enough in practice. But then, why care?

Conclusion

I guess that people might be interested in protecting software despite the presence of bugs in the software. Just because my software is broken does not necessarily mean that the attacker needs to have an easy time attacking it. So maybe what this paper really says is: “hard-to-attack-ness does not compose”.

I had a really hard time reading this paper. The paper is written in plain English, which is nice. But the lack of precise definitions makes me confused. For example, what would be the precise meaning of a threat model graph with respect to a particular program? At the same time, this paper comes from an A-ranked conference, so it must have made sense to other people. Maybe I am missing something, but it seems to me that the paper confuses the incidental with the fundamental. If you have any idea of whay I am missing, please let me know.