Toolbx — running the same host binary on Arch Linux, Fedora, Ubuntu, etc. containers

This is a deep dive into some of the technical details of Toolbx and is a continuation from the earlier post about bypassing the immutability of OCI containers.

The problem

As we saw earlier, Toolbx uses a special entry point for its containers. It’s the toolbox executable itself.

$ podman inspect --format "{{.Config.Cmd}}" --type container fedora-toolbox-36
toolbox --log-level debug init-container ...

This is achieved by bind mounting the toolbox executable invoked by the user on the hosts to /usr/bin/toolbox inside the containers. While this has some advantages, it opens the door to one big problem. It means that executables from newer or different host operating systems might be running against older or different run-time environments inside the containers. For example, an executable from a Fedora 36 host might be running inside a Fedora 35 Toolbx, or one from an Arch Linux host inside an Ubuntu container.

This is very unusual. We only expect executables from an older version of an OS to keep working on newer versions of the same OS, but never the other way round, and definitely not across different OSes.

When binaries are compiled and linked against newer run-time environments, they may start relying on symbols (ie., non-static global variables, functions, class and struct members, etc.) that are missing in older environments. For example, glibc-2.32 (used in Fedora 33 onwards) added a new version of the pthread_sigmask symbol. If toolbox binaries built and linked against glibc-2.32 are run against older glibc versions, then they will refuse to start.

$ objdump -T /usr/bin/toolbox | grep GLIBC_2.32
0000000000000000      DO *UND*        0000000000000000  GLIBC_2.32  pthread_sigmask

This means that one couldn’t use Fedora 32 Toolbx containers on Fedora 33 hosts, or similarly any containers with glibc older than 2.32 on hosts with newer glibc versions. That’s quite the bummer.

If the executables are not ELF binaries, but carefully written POSIX shell scripts, then this problem goes away. Incidentally, Toolbx used to be implemented in POSIX shell, until it was re-written in Go two years ago, which is how it managed to avoid this problem for a while.

Fortunately, Go binaries are largely statically linked, with the notable exception of the standard C library. The scope of the problem would be much bigger if it involved several other dynamic libraries, like in the case of C or C++ programs.

Potential options

In theory, the easiest solution is to build the toolbox binary against the oldest supported run-time environment so that it doesn’t rely on newer symbols. However, it’s easier said than done.

Usually downstream distributors use build environments that are composed of components that are part of that specific version of the distribution. For example, it will be unusual for an RPM for a certain Fedora version to be deliberately built against a run-time from an older Fedora. Carlos O’Donell had an interesting idea on how to implement this in Fedora by only ever building for the oldest supported branch, adding a noautobuild file to disable the mass rebuild automation, and having newer branches always inherit the builds from the oldest one. However, this won’t work either. Building against the oldest supported Fedora won’t be enough for Fedora’s Toolbx because, by definition, Toolbx is meant to run different kinds of containers on hosts. The oldest supported Fedora hosts might still be too new compared to containers of supported Debian, Red Hat Enterprise Linux, Ubuntu etc. versions.

So, yes, in theory, this is the easiest solution, but, in practice, it requires a non-trivial amount of cross-distribution collaboration, and downstream build system and release engineering effort.

The second option is to have Toolbx containers provide their own toolbox binary that’s compatible with the run-time environment of the container. This would substantially complicate the communication between the toolbox binaries on the hosts and the ones inside the containers, because the binaries on the hosts and containers will no longer be exactly the same. The communication channel between commands like toolbox create and toolbox enter running on the hosts, and toolbox init-container inside the containers can no longer use a private and unstable interface that can be easily modified as necessary. Instead, it would have complicated backwards and forwards compatibility requirements. Other than that, it would complicate bug reports, and every single container on a host may need to be updated separately to fix bugs, with updates needing to be co-ordinated across downstream distributors.

The next option is to either statically link against the standard C library, or disable its use in Go. However, that would prevent us from using glibc’s Name Service Switch to look up usernames and groups, or to resolve host names. The replacement code, written in pure Go, can’t handle enterprise set-ups involving Network Information Service and Lightweight Directory Access Protocol, nor can it talk to host OS services like SSSD, systemd-userdbd or systemd-resolved.

It’s true that Toolbx currently doesn’t support enterprise set-ups with NIS and LDAP, but not using NSS will only make it more difficult to add that support in future. Similarly, we don’t resolve any host names at the moment, but given that we are in the business of pulling content over the network, it can easily become necessary in the future. Disabling the use of NSS will leave the toolbox binary as this odd thing that behaves differently from the rest of the OS for some fundamental operations.

An extension of the previous option is to split the toolbox executable into two. One dynamically linked against the standard C library for the hosts, and another that has no dynamic linkage to run inside the containers as their entry point. This can impact backwards compatibility and affect the developer experience of hacking on Toolbx.

Existing Toolbx containers want to bind mount the toolbox executable from the host to /usr/bin/toolbox inside the containers and run toolbox init-container as their entry point. This can’t be changed because of the immutability of OCI containers, and Toolbx simply can’t afford to break existing containers in a way where they can no longer be entered. This means that the toolbox executable needs to become a shim, without any dynamic linkage, that forwards the invocation to the right executable depending on whether it’s running on the hosts or inside the containers.

That brings us to the developer experience of hacking on Toolbx. The first thing note is that we don’t to go back to using POSIX shell to implement the executable that’s meant to run inside the container. Ondřej spent a lot of effort replacing the POSIX shell implementation of Toolbx, and we don’t want to undo any part of that. Ideally, we would use the same programming language (ie., Go) to implement both executables so that one doesn’t need to learn multiple disparate languages to work on Toolbx. However, even if we do use Go, we would have to be careful not to share code across the two executables, or be aware that they may have subtle differences in behaviour depending on how they might be linked.

Then there’s the developer experience of hacking on Toolbx on Fedora Silverblue and similar OSTree-based OSes, which is what you would do to eat your own dog food. Experiences are always subjective and this one is unique to hacking Toolbx inside a Toolbx. So let’s take a moment to understand the situation.

On OSTree-based OSes, Toolbx containers are used for development, and, generally speaking, it’s better to use container-specific locations invisible to the host as the development prefixes because the generated executables are specific to each container. Executables built on one container may not work on another, and not on the hosts either, because of the run-time problems mentioned above. Plus, it’s good hygiene not to pollute the hosts.

Similar to Flatpak and Podman, Toolbx is a tool that sets up containers. This means that unlike most other executables, toolbox must be on the hosts because, barring the init-container command, it can’t work inside the containers. The easiest way to do this, is to have a separate terminal emulator with a host shell, and invoke toolbox directly from Meson’s build directory in $HOME that’s shared between the hosts and the Toolbx containers, instead of installing toolbox to the container-specific development prefixes. Note that this only works because toolbox has always been implemented in programming languages with none to minimal dynamic linking, and only if you ensure that the Toolbx containers for hacking on Toolbx matches the hosts. Otherwise, you might run into the run-time problems mentioned above.

The moment there is one executable invoking another, the executables need to be carefully placed on the file system so that one can find the other one. This means that either the executables need to be installed into development prefixes or that the shim should have special logic to work out the location of the other binary when invoked directly from Meson’s build directory.

The former is a problem because the development prefixes will likely default to container-specific locations invisible from the hosts, preventing the built executables from being trivially invoked from the host. One could have a separate development prefix only for Toolbx that’s shared between the containers and the hosts. However, I suspect that a lot of existing and potential Toolbx contributors would find that irksome. They either don’t know or want to set up a prefix manually, but instead use something like jhbuild to do it for them.

The latter requires two different sets of logic depending on whether the shim was invoked directly from Meson’s build directory or from a development prefix. At the very least this would involve locating the second executable from the shim, but could grow into other areas as well. These separate code paths would be crucial enough that they would need to be thoroughly tested. Otherwise, Toolbx hackers and users won’t share the same reality. We could start by running our test suite in both modes, and then meticulously increase coverage, but that would come at the cost of a lengthier test suite.

Failed attempts

Since glibc uses symbol versioning, it’s sometimes possible to use some .symver hackery to avoid linking against newer symbols even when building against a newer glibc. This is what Toolbox used to do to ensure that binaries built against newer glibc versions still ran against older ones. However, this doesn’t defend against changes to the start-up code in glibc, like the one in glibc-2.34 that performed some security hardening.

Current solution

Alexander Larsson and Ray Strode pointed out that all non-ancient Toolbx containers have access to the hosts’ /usr at /run/host/usr. In other words, Toolbx containers have access to the host run-time environments. So, we decided to ensure that toolbox binaries always run against the host run-time environments.

The toolbox binary has a rpath pointing to the hosts’ libc.so somewhere under /run/host/usr and it’s dynamic linker (ie., PT_INTERP) is changed to the one inside /run/host/usr. Unfortunately, there can only be one PT_INTERP entry inside the binary, so there must be a /run/host on the hosts too for the binary to work on the hosts. Therefore, a /run/host symbolic link is also created on the host pointing to the hosts’ /.

The toolbox binary now looks like this, both on the hosts and inside the Toolbx containers:

$ ldd /usr/bin/toolbox
    linux-vdso.so.1 (0x00007ffea01f6000)
    libc.so.6 => /run/host/usr/lib64/libc.so.6 (0x00007f6bf1c00000)
    /run/host/usr/lib64/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2 (0x00007f6bf289a000)

It’s been almost a year and thus far this approach has held its own. I am mildly bothered by the presence of the /run/host symbolic link on the hosts, but not enough to lose sleep over it.

Other options

Recently, Robert McQueen brought up the idea of possibly using the Linux kernel’s binfmt_misc mechanism to modify the toolbox binary on the fly. I haven’t explored this in any seriousness, but maybe I will if the current set-up doesn’t work out.

Written by Debarshi Ray

2 October, 2022 at 19:28

Posted in Buildah, Containers, CoreOS, Fedora, Glibc, GNU, Linux, OSTree, Podman, Silverblue, Toolbox, Toolbx

2 Responses

Subscribe to comments with RSS.

Have you considered using the host’s dynamic linker and libc in the toolbox? The dynamic linker is both a library and a program. Instead of having the kernel run the dynamic linker as it does by finding the DT_INTERP in the target ELF program, you can just invoke the one you want directly and give it any additional library paths. Just try running /lib64/ld-linux-x86-64.so.2 with no options.

So, you could do something like:

1. Get the name of the interpreter from the host toolbox (something like readelf -p .interp /usr/bin/toolbox).
2. Resolve the path to it (in case it’s an absolute symlink).
3. Inside the container execute /run/host/path/to/ld-X.YY.so –library-path /run/host/usr/lib64:/run/host/usr/lib /run/host/usr/bin/toolbox …

Probably the exact library path would need some more heuristics.

dbnicholson

4 October, 2022 at 00:17

Reply
- Yes. Currently, when we talk about always running /usr/bin/toolbox against its build-time ABI or the host’s run-time environment, we are basically talking about the dynamic linker and the C library.
  
  However, the exact implementation at the moment is a bit different from the one you describe, because existing Toolbx containers have their entry point set to `/usr/bin/toolbox`, and we cannot change that. In other words, we can’t invoke the dynamic linker as an executable inside the container unless we insert `/usr/bin/toolbox` as a shim that does that.
  
  So far, we have tried to avoid a shim so that we don’t have to co-ordinate multiple binaries. That might have to change if the current approach doesn’t work out.
  
  Debarshi Ray
  
  6 October, 2022 at 15:25
  
  Reply

Debarshi's den