notesassorted ramblings on computer

Debugging Reproducibility Issues in Rust Software

Recently, I packaged pimsync for Guix and—as part of the process—was made aware that its build was not reproducible. The pimsync tool is written in Rust, and it turned out that the underlying problem was the utilization of a non-deterministic proc_macro in an indirect dependency [1]. Since this is very much Rust-specific,1 I wanted to document how I got to the bottom of this to ease future investigations of Rust-related reproducible build issues.

Using Diffoscope

The quintessential tooling for debugging reproducible builds is diffoscope. It is similar to the diff(1) tool that we all know and love but is better suited for comparing binaries, as it is capable of transforming these to human-readable formats beforehand. For example, for executable ELF binaries, it diffs the objdump(1) and readelf(1) output.2

To compare Rust binaries using diffoscope, we need to build them twice and compare the target/ directory as follows:3

$ cargo build --release && mv target target.1
$ cargo build --release && mv target target.2
$ diffoscope --html diff.html target.1 target.2

In many cases, inspecting the diff.html will already uncover the underlying issue. For example, a common reproducible build problem is including some sort of timestamp in the build artifact. Such issues are usually easily spotted using diffoscope.

However, in the pimsync case, the diffoscope output did not enable me to instantly identify the root cause. There were too many differences across various files, and the data and text segment of the pimsync binary itself was modified in different subtle ways for each build iteration that I compared. This is a general problem with diffing binaries: small changes to one function can affect ordering of the entire binary.

Comparing Dependencies in Isolation

I decided to employ divide-and-conquer to isolate the non-deterministic build component. As Rust software commonly consists of a large amount of dependencies, I wanted to check if the issue is introduced by a particular crate. The Cargo build system stores the build artifact for dependencies in target/release/deps/*.rlib. We can force diffoscope to only compare these files using:

diffoscope --html diff-rlib.html \
	--exclude-directory-metadata=yes \
	--exclude '*.d' \
	target.1/release/deps target.2/release/deps

The resulting diff was much smaller and mainly identified differences in the *.rlib file of two crates:

  1. mail_parser: An email parsing library for Rust.
  2. calcard: An iCalendar parsing library.

These two crates are related closely to each other: calcard depends on mail_parser. Therefore, I decided to more closely investigate the mail_parser crate. For this purpose, I obtained the source of this crate (and all other dependencies used by pimsync) using:

cargo vendor deps/

Afterward, I changed to the mail_parser directory in deps/ and checked its build output in isolation:

$ cargo build --release && mv target target.1
$ cargo build --release && mv target target.2
$ md5sum target.?/release/libmail_parser.rlib
84157fb428b5028da82fa6a0ce28f340  target.1/release/libmail_parser.rlib
886345de88bf42701f7e0894c413c24b  target.2/release/libmail_parser.rlib

Turns out, the mail_parser crate is itself not reproducible. Therefore, pimsync (which uses mail_parser) is also not reproducible! However, when comparing the target/release/deps/*.rlib files of mail_parser I did not observe a difference in any of the dependencies of mail_parser. Consequentially, I initially assumed that there must be an issue with the this crate.

Diffing LLVM IR

In order to compare different build iterations of mail_parser more closely, I wanted to diff the resulting LLVM IR representation. LLVM IR is the intermediate representation used by the Rust compiler. We can force Cargo to emit this intermediate representation by invoking it as follows from the mail_parser source tree:

$ RUSTFLAGS="--emit=llvm-bc" cargo build && mv target target.1
$ RUSTFLAGS="--emit=llvm-bc" cargo build && mv target target.2

This will place the LLVM IR representation, as LLVM bitcode, in target/release/deps/*.bc. Unintuitively, this also includes the representation of the mail_parser crate itself in target/release/deps/mail_parser-*.bc. LLVM bitcode is a binary representation. Fortunately our good friend diffoscope can also transform this format to a human-readable representation using llvm-dis. Therefore, we can compare the different LLVM IR representations using:

diffoscope --html diff-llvm.html \
	--exclude-directory-metadata=yes \
	--exclude '*.d' \
	--exclude '*.rlib' \
	--exclude '*.rmeta' \
	target.1/release/deps target.2/release/deps

This diff will include multiple chunks that look similar to the following:

@@ -28879,13 +28879,13 @@
   %hash = load i8, ptr %9, align 1, !noundef !3
   switch i8 %hash, label %bb84 [
     i8 119, label %bb21
-    i8 98, label %bb20
-    i8 115, label %bb26
-    i8 114, label %bb18
-    i8 118, label %bb17
-    i8 100, label %bb16
+    i8 118, label %bb20
+    i8 108, label %bb19
+    i8 98, label %bb18
+    i8 100, label %bb17
+    i8 114, label %bb16
     i8 103, label %bb15
-    i8 108, label %bb14
+    i8 115, label %bb36
   ]

This tells us that there is a difference with respect to the generation of LLVM IR switch instructions. Similar to Rust, LLVM IR also supports functions; thus, we can check the name of the function to which the basic block with this difference belongs. I traced this back to the Rust function is_re_prefix from mail_parser, which is implemented as follows:

pub fn is_re_prefix(prefix: &str) -> bool {
    hashify::tiny_set! {prefix.as_bytes(),
        "re",
        "res",
        "sv",
        "antw",
        "ref",
        "aw",
        "απ",
        "השב",
        "vá",
        "r",
        "rif",
        "bls",
        "odp",
        "ynt",
        "atb",
        "رد",
        "回复",
        "转发",
    }
}

The is_re_prefix function uses a so-called proc_macro (i.e., a procedural macro) from the hashify crate.

Fun with Procedural Macros

Procedural macros allow execution of Rust code at compile-time. The executed code operates on Rust syntax to enable flexible code generation. Obviously, this has some potential for becoming a footgun. Specifically, for reproducible builds, we must ensure that our procedural macros are deterministic.

Unfortunately, in the case of hashify, the tiny_set! procedural macro operates on a HashMap internally. In Rust, like in many other languages, iteration over a HashMap occurs in “arbitrary order”. Therefore, the tiny_set! macro produces not-deterministic results and causes the build of pimsync to not be reproducible.

I reported this as a bug in hashify.


  1. Of cause, this can occur in any programming language with extensive compile-time code generation support.↩︎

  2. Consequentially, it is extremely helpful to build the program with debug information before running diffoscope.↩︎

  3. For the original pimsync issue, I also passed -j1 to Cargo for the initial build to ensure that this is not a parallel build problem.↩︎