Bug characteristics in open source software


Recently I came across an interesting Empirical Software Engineering paper from 2013. In the paper, titled Bug characteristics in open source software, the authors reported on their studies of bugs in the Linux kernel, Mozilla (today known as Firefox), and the Apache server (all C/C++ projects at that time). They manually classified over 2000 bugs and, among other insights, found that around 2/3 of security vulnerabilities were caused by semantic bugs (which could be also labeled as “functional correctness” issues), compared to around 1/3 of vulnerabilities caused by memory bugs (buffer overflows, double frees etc). Moreover, the proportion of memory bugs was decreasing with time, as the projects were getting more mature. Does this mean that the efforts put into memory-safe languages like Rust are, so to speak, beating a dead bug?

The paper

Now, these “thirds” are only rough numbers, since some bugs were labeled with multiple categories. I would encourange everyone interested to dive into the original paper to gain more precise understanding. Here I only summarize a few points from the paper.

The paper is packed with information. The authors have asked 7 research questions, some of which have sub-questions. Besides the question about the truth of the common belief about the origin of security vulnerabilities which I referred to in the hook of this post, they studied concurrency bugs and bugs in graphical user interfaces, as well as some machine learning techniques for automatically classifying bugs.

Regarding the methods used in the paper: I can’t assess them since this is way too far from the research I do, but I find them both interesting and reasonable. The authors collected three sets of bugs: (1) a randomly sampled set of bugs (1135 in total), (2) security bugs (583 in total), and (3) concurrency bugs (1387 in total). The randomly sampled set contains only bugs that were fixed, so that the root cause of each bug is well-known. Security bugs were taken from the US National Vulnerability Database; they considered only those that were already classified by NVD, and they considered them all. For concurrency bugs, since these were relatively rare and random sampling would not help much, the authors automatically searched bug reports for concurrency-related keywords (like “deadlock”) and manually verified whether they are really concurrency bugs. Besides classifying bugs by their root cause (memory/concurrency/semantic), the authors also classify them by impact (hang/crash/data corruption etc) and the component they were in. In addition, security bugs were also classified into categories (confidentiality, integrity, etc).

The sampled bugs spanned over approximatelly 10 years: the oldest bug report from Mozilla was from 1998, from Linux in 2002, and the sampling date was in 2010. This allowed the authors to plot how the proportion of memory-related bugs evolved over time. For example, in the Mozilla browser, around 30% of the sampled bugs from 1999 were memory-related, which decreased to around 10% in 2009. This decrease in memory-related bugs is attributed to the increasing use of tools like Valgrind and Coverity. Another interesting observation from the paper was that concurrency bugs often cause program hangs (in over 40% of cases), but rarely data corruption (around 1% of cases). And then there is the automatic classification of bugs, and appendices. The paper has 37 pages excluding references.

I would like to read more papers like this, especially some newer ones. From Empirical Software Engineering, I have on my reading list Towards understanding bugs in Python interpreters and A study of common bug fix patterns in Rust; the second might also provide some inside to the question.

The question

Getting rid of a third of all security vulnerabilities would be nice. One particular aspect I find appealing is that this increased security would come essentially without any extra effort besides that put into learning a programming language with ownership types (such as Rust). However, what shall we do with the remaining two thirds?