Toolbx — bypassing the immutability of OCI containers

This is a deep dive into some of the technical details of Toolbx. I find myself regularly explaining them to various people, so I thought that I should write them down. Feel free to read and comment, or you can also happily ignore it.

The problem

OCI containers are famous for being immutable. Once a container has been created with podman create, it’s attributes can’t be changed anymore. For example, the bind mounts, the environment variables, the namespaces being used, and all the other attributes that can be specified via options to the podman create command. This means that once there’s a Toolbx, it wouldn’t be possible to give it access to a new set of files from the host if the need arose. The Toolbx would have to be deleted and re-created with access to the new paths.

This is a problem, because a Toolbx is where the user sets up her development and troubleshooting environment. Re-creating a Toolbx might mean reinstalling a number of different packages, tweaking configuration files, redeploying various artifacts and so on. Having to repeat all that in the middle of a long hacking session, just because the container’s attributes need to be tweaked, can be annoying.

This is unlike Flatpak containers, where it’s possible to override the permissions of a Flatpak either persistently through flatpak override or temporarily during flatpak run.

Secondly, as the Toolbx code evolves, we want to be able to transparently update existing Toolbxes to enable new features and fix bugs. It would be a real drag if users had to consciously re-create their containers.

The solution

Toolbx bypasses this by using a special entry point for the container. Those inquisitive types who have run podman inspect on a Toolbx container might have noticed that the toolbox executable itself is the container’s entry point.

$ podman inspect --format "{{.Config.Cmd}}" --type container fedora-toolbox-36
toolbox --log-level debug init-container ...

This means that when Toolbx starts a container using podman start, the toolbox init-container command gets run as the first process inside the container. Only after this has run, does the user’s interactive shell get spawned.

Instead of setting up the container entirely through podman create, Toolbx tries to use this reflexive entry point as much as possible. For example, Toolbx doesn’t use podman create --volume /tmp:/tmp to give access to the host’s /tmp inside the container. It bind mounts the entire root filesystem from the host at /run/host in the container with podman create --volume /:/run/host. Then, later when the container is started, toolbox init-container recursively bind mounts the container’s /run/host/tmp to /tmp. Since the container has its own mount namespace, the /run/host and /tmp bind mounts are neatly hidden away from the host.

Therefore, if in future additional host locations need to be exposed within the Toolbx, then those can be added to toolbox init-container, and once the user restarts the container after updating the toolbox executable, the new locations will show up inside the existing container. Similarly, if the mount parameters of an existing location need to be tweaked, or if a host location needs to be removed from the container.

This is not restricted to just bind mounts from the host. The same approach with toolbox init-container is used to configure as many different aspects of the container as possible. For example, setting up users, keeping the timezone and DNS configuration synchronized with the host, and so on.

Further details

One might wonder how a Toolbx container manages to have a toolbox executable inside it, especially since the toolbox package is not installed within the container. It is achieved by bind mounting the toolbox executable invoked by the user on the host to /usr/bin/toolbox inside the container.

This has some advantages.

There is always only one version of the toolbox executable that’s involved — the one that’s on the host. This means that the exact invocation of toolbox init-container, which is baked into the Toolbx and shows up in podman inspect, is the only interface that needs to be kept stable as the Toolbx code evolves. As long as toolbox init-container can be invoked with that specific command line, everything else can be changed because it’s the same executable on both the host and inside the container.

If the container had a separate toolbox package in it, then the user might have to separately update another executable to get the expected results, and we would have to ensure that different mismatched versions of the executable can work with each other across the host and the container. With a growing number of containers, the former would be a nightmare for the user, while the latter would be almost impossible to test.

Finally, having only one version of the toolbox executable makes it a lot easier for users to file bug reports. There’s only one version to report, not several spread across different environments.

This leads to another problem

Once you let this sink in, you might realize that bind mounting the toolbox executable from the host into the Toolbx means that an executable from a newer or different operating system might be running against an older or different run-time environment inside the container. For example, an executable from a Fedora 36 host might be running inside a Fedora 35 Toolbx, or one from an Arch Linux host inside an Ubuntu container.

This is very unusual. We only expect executables from an older version of an OS to keep working on newer versions of the same OS, but never the other way round, and definitely not across different OSes.

I will leave you with that thought and let you puzzle over it, because it will be the topic of a future post.

Written by Debarshi Ray

22 July, 2022 at 18:48

Posted in Buildah, Containers, CoreOS, Fedora, OSTree, Podman, Silverblue, Toolbox, Toolbx

Debarshi's den