Debarshi's den

Toolbx is a release blocker for Fedora 39 onwards

with 2 comments

This is the second instalment of my 2023 retrospective series on Toolbx. 1

One very important thing that we did behind the scenes was to make Toolbx a release blocker for Fedora 39 and onwards. This means that the registry.fedoraproject.org/fedora-toolbox OCI image is considered a release-blocking deliverable, and there are release-blocking test criteria to ensure that the toolbox RPM is usable.

Why do that?

Earlier, there was no formal requirement for Toolbx to be usable when a new Fedora was released. That was a problem for a tool that’s so popular and provides something as fundamental as an interactive command line environment for software development and troubleshooting the host operating system. Everybody expects their CLI environment to just work even under very adverse conditions, and Toolbx should be no different. Except that Toolbx is slightly more complicated than running Bash or Z shell directly on the host OS, and, therefore, requires a bit more diligence.

Toolbx has two parts — an OCI image, which defaults to registry.fedoraproject.org/fedora-toolbox on Fedora hosts, and the toolbox RPM. The OCI image is pulled by the RPM to set up a containerized interactive CLI environment.

Let’s look at each separately.

The image

First, we wanted to ensure that there is an up to date fedora-toolbox OCI image published on registry.fedoraproject.org as a release-blocking deliverable at critical points in the development schedule, just like the installation ISOs for the Editions from download.fedoraproject.org. For example, when an upcoming Fedora release is branched from Rawhide, and for the Beta and Final releases.

One of the recurring complaints that we used to get were from users of Fedora Rawhide Toolbx containers, when Rawhide gets branched in preparation for the Beta for the next Fedora release. At this point, the previous Rawhide version becomes the Branched version, and the current Rawhide version increases by one. If the fedora-toolbox images aren’t part of the mass branching performed by Fedora Release Engineering, then someone has to quickly step in after they have finished to refresh the images to ensure consistency. This sort of ad hoc manual co-ordination rarely works, and it left users in the lurch.

With this change, the fedora-toolbox image is part of the nightly Fedora composes, and the branching is handled by Fedora Release Engineering just like any other release-blocking deliverable. This makes the image as readily available and updated as the fedora and fedora-minimal OCI images or any other deliverable, and we hope that it will improve the user experience for Rawhide Toolbx containers.

If someone installs the Fedora Beta or the Final on their host, and creates a Toolbx container using the default image, then, barring exceptions, the host and the container now have the same RPM versions for all packages. Just like Fedora Silverblue and Workstation are released with the same versions. This ensures greater consistency in terms of bug-fixes, features and pending updates.

In the past, this wasn’t the case and it led to occasional surprises. For example, the change to make RPM use a Sequoia based OpenPGP parser made it impossible to install third party RPMs in the fedora-toolbox image, even long after the actual bug was fixed.

The RPM

Second, we wanted to have release-blocking test criteria to ensure that the toolbox RPM is usable at critical points in the development schedule. This is to ensure that changes in the Toolbx stack, and future changes in other parts of the operating system do not break Toolbx — at least not for the Beta and Final releases. It’s good to have the fedora-toolbox image be more readily available and updated, but it’s better if Toolbx works more reliably as a whole.

Examples of changes in the Toolbx stack causing breakage can be FUSE preventing RPMs with file capabilities from being installed inside Toolbx containers, Toolbx bind mounts preventing RPMs with %attr() from being installed or causing systemd-tmpfiles(8) to throw errors, etc.. Examples of changes in other parts of the OS can be changes to Fedora’s Kerberos stack causing Kerberos to stop working inside Toolbx containers, changes to the sysctl(8) configuration breaking ping(8), changes in Mutter breaking graphical applications, etc..

The test criteria for the toolbox RPM also implicitly tests the fedora-toolbox image, and co-ordinates several disparate groups of developers to ensure that the containerized interactive command line Toolbx environments on Fedora are just as reliable as those running directly on the host OS.

Tooling changes

This does come with a significant tooling change that isn’t obvious at first. The fedora-toolbox OCI image is no longer defined as a layered image through a Container/Dockerfile. Instead, it’s built as a base image through Kickstarts and Pungi, just like the fedora and fedora-minimal images.

This was necessary because the nightly Fedora composes work with Kickstarts and Pungi, not Container/Dockerfiles. Moreover, building Fedora OCI images from a Dockerfile with fedpkg container-build uses an ancient unmaintained version of OpenShift Build Service that requires equally unmaintained ancient versions of Fedora to run, and the fedora-toolbox image was the only thing using Container/Dockerfiles in Fedora.

We either had to update the Fedora infrastructure to use OpenShift Build Service 2.x; or use Kickstarts and Pungi, which uses Image Factory, to build the fedora-toolbox image. We chose the latter, because updating the infrastructure would be a significant effort, and by using Kickstarts and Pungi we get to stay close to the fedora and fedora-minimal images and simplify the infrastructure.

The Fedora Flatpaks were also being built using the same ancient and unmaintained version of OpenShift Build Service, and they too are in the process being migrated. However, that’s outside the scope of this post.

One big benefit of fedora-toolbox not being a layered image based on top of the fedora image is that it removes the constant fight against the efforts to minimize the size of the latter. The fedora-toolbox image is designed for interactive command line use in long-lived containers, and not for deploying server-side applications and services in ephemeral ones. This means that dictionaries, documentation, locales, iconv converter modules, translations, etc. are more important than reducing the size of the images. Now that the image is built from scratch, it has full control over what goes into it.

Unfortunately, Image Factory is weakly maintained and setting it up on one’s local machine is a lot more complicated than using podman build. One can do scratch builds on the Fedora infrastructure with koji image-build --scratch, but only if they have been explicitly granted permissions, and then they have to download the tarball and use skopeo copy to place them in containers-storage so that Podman can see it. All that is again more complicated than doing a podman build.

Due to this difficulty of untangling the image build from the Fedora infrastructure, we haven’t published the sources of the fedora-toolbox image for recent Fedora versions upstream. We do have a fedora-toolbox:39 image defined through a Container/Dockerfile, but that was done purely as a contingency during the Fedora 39 development cycle.

This does degrade the developer experience of working on the fedora-toolbox image, but, given all the other advantages, we think that it’s worth it.

As of this writing, there’s a Fedora 40 Change to switch to using KIWI to build the OCI images, including fedora-toolbox, instead of Image Factory. KIWI seems more strongly maintained and a lot easier to set up locally, which is fantastic. So, it should be all rainbows and unicorns, once we soldier through another port of the fedora-toolbox image to a different tooling and source language.

Acknowledgements

Last but not the least, getting all this done on time required a good deal of co-ordination and help from several different individuals. I must thank Sumantro for leading the effort; Kevin, Tomáš and Samyak for all the infrastructure and release engineering work; and Adam and Kamil for all the testing and validation.

  1. Toolbx now offers built-in support for Arch Linux and Ubuntu ↩︎

Written by Debarshi Ray

1 March, 2024 at 13:44

Toolbx now offers built-in support for Arch Linux and Ubuntu

leave a comment »

… and why did it take so long for that to happen?

The year 2023 was a busy one in Toolbx land. We had two big releases, and one of the important things we did was to start offering built-in support for Arch Linux and Ubuntu.

What does that mean?

It means that if you have the latest Toolbx release, then a simple toolbox create on Arch Linux and Ubuntu hosts will create matching Toolbx containers, instead of a fallback Fedora container. If you are using some other host operating system, then you can get the same with:

$ toolbox create --distro arch
$ toolbox create --distro ubuntu --release 22.04

It also means that we will do our best to treat Arch Linux and Ubuntu on equal terms with Fedora and avoid regressions.

Now, that last sentence has a lot more to it than may seem at first. So, I am going to try to unwrap it to show why I think this one of the important things we did in 2023. You may find it interesting if you want the set of supported Linux distributions to expand further. You can also skip the details and go straight to the end to get the jist of it.

Why go beyond Fedora?

Even though Toolbx was created for the immediate needs of Fedora Silverblue, it has long since exceeded the bounds of Fedora and OSTree based operating systems. At the same time, Toolbx only had built-in support for Fedora and Red Hat Enterprise Linux. This disconnect led to a lot of people eagerly asking for better support for Linux distributions beyond the Fedora family.

From the outside, it looks like just a matter of putting together a Containerfile that layers on some extra content on top of a base image, and then publishing it on some OCI registry somewhere. However, it’s not that simple.

Why did it take so long?

Toolbx is a glue. Given an existing host operating system and a desired command line environment, it will create and set up an OCI container with the desired CLI environment that has seamless access to the user’s home directory, the Wayland and X11 sockets, networking (including Avahi), removable devices (like USB sticks), systemd journal, SSH agent, D-Bus, ulimits, /dev and the udev database, etc..

The closest analogy I can think of is libosinfo generating scripts to perform automated installations of various operating systems as virtual machines. Except, Toolbx containers aren’t expected to be as cleanly or formally separated from the host as VMs are, and the world of OCI containers is designed for deploying server-side applications and services, not for the persistent interactive use that Toolbx offers. This means that Toolbx has to carefully make each host OS version match the container, and make the container fit for interactive use. For example, libosinfo doesn’t have to worry about making the VM’s command line shell prompt and /etc/resolv.conf work with that of the host’s, nor does it have to convert the cloud image of an OS into a variant suited for a desktop or laptop.

This means that Toolbx has to very carefully set up the container to work with the particular host operating system version in use. For example, there can be subtle differences between running a Fedora 39 container on a Fedora 38 host and vice versa. If there are three different Fedora versions that we care about (such as Fedoras 38, 39 and 40), then that’s nine different combinations of container and host OSes to support. Sometimes, the number of relevant versions goes from three to four, and the number of combinations jumps to sixteen. Now, add Red Hat Enterprise Linux to the mix. Assuming that we only care about the latest point-releases (such as RHELs 8.9 and 9.3), we are now looking at twenty-five to thirty-six combinations. In reality, it’s a lot more, because we care about more than just the latest RHEL point-releases.

You can see where this is going.

With this addition of Arch Linux and Ubuntu, we have added at least seven new versions that we care about. That’s a total of approximately one hundred and fifty combinations!

I can assure you that this isn’t a theoretical concern. Here’s a bug about domain name resolution being completely broken in Toolbx containers for Fedoras 39 and 40 on Red Hat Enterprise Linux 9 hosts.

Tests

So, to promise with a straight face that we will do our best to treat Arch Linux and Ubuntu on equal terms with Fedora and avoid regressions, we need to automate this. There’s no way a test matrix of this size can be tackled manually. Hence, tests.

However, it’s easier said than done, because it’s not just about having the tests. We also need to run them on as many different host operating system versions as possible. Unfortunately, most continuous integration systems out there either only offer containers, which are useless because Toolbx needs to be tested on the host OS, or they offer Ubuntu hosts. One exception is Software Factory, which runs an instance of Zuul CI, and offers Fedora and CentOS Stream hosts.

Currently, we have a test suite with more than three hundred tests that covers a good chunk of all supported OCI containers and images. All these tests are run separately on hosts representing all active Fedora versions, with subsets being run on CentOS Stream 9 and Ubuntu 22.04 hosts. We are working on ensuring that the entire test suite gets run on CentOS Stream 9 and Ubuntu 22.04, and are hoping to introduce CentOS Stream 8 and Ubuntu 24.04 hosts to the mix.

Plus, these aren’t just simple smoke tests. They don’t just create and start the containers, and check for a successful exit code. They comprehensively poke at various attributes of the containers’ runtime environment and error conditions to ensure that things really do work as advertised.

I think this is a pretty impressive set-up. It took time to build it, and it’s still not done, but I think it was worth it.

We have also been busy in Fedora, pushing the quality of the Toolbx stack up a notch, but that’s a topic for another post.

So, if you want to see built-in support for your favourite Linux distribution, then please help us by providing some test runners. Right now we could really use one for Arch Linux to maintain our existing support for it, and one for Debian because we want to include it in the near future.

Maintainers

Finally, since Toolbx is a glue, sometimes we need to drive changes into the Linux distributions that we claim to support. For example, changing the sysctl(8) configuration to make ping(8) work, fixes to the start-up scripts for Bash and Z shell, etc.. This means that we need maintainers who will own the work that’s specific to a particular Linux distribution. As a Fedora contributor, I can take care of Fedora, but I cannot sign up to take care of every single distribution that’s out there.

In that sense, I am delighted that we have a bunch of dedicated folks taking care of the Arch Linux and Ubuntu support. Namely, Morten Linderud for Arch Linux, and Andrej Shadura and Ievgen Popovych for Ubuntu.

Conclusion

Two things need to happen for us to add built-in support for a new Linux distribution.

First, we need test runners that let us run our upstream test suite on the host operating system, and not inside a container. Right now we could really use one for Arch Linux to maintain our existing support for it, and one for Debian because we want to include it in the near future.

Second, we need someone to step up to drive changes into the Linux distribution in question, and own the work that’s specific to it.

I am also going to add this to the Toolbx documentation, so that it’s easy to find in future.

Written by Debarshi Ray

20 January, 2024 at 04:28

Fedora meets RHEL: upgrading UBI to RHEL

with 3 comments

Six years ago

As part of our efforts to make Fedora Workstation more attractive for developers, particularly those building applications that would be deployed on Red Hat Enterprise Linux, we had made it easy to create gratis, self-supported Red Hat Enterprise Linux virtual machines inside GNOME Boxes. You had to join the Red Hat Developer Program by creating an account on developers.redhat.com and with that you not only had gratis, self-supported access to RHEL, but also a series of other products like Red Hat JBoss Middleware, Red Hat OpenShift Container Platform and so on.

Few years later

Fedora Silverblue became a thing, and so did Toolbx. So, we decided to take this one step further.

Toolbx already had support for Red Hat Enterprise Linux for the past two and a half years. It means that on RHEL hosts, Toolbx sets up a RHEL container that has access to all the software and support that the host is entitled to. On hosts that aren’t running RHEL, if you want, it will set up a gratis, self-supported container based on the Red Hat Universal Base Image, which is a limited subset of RHEL:

$ toolbox create --distro rhel --release 9.2

However

This works well only as long as you are running a Red Hat Enterprise Linux host. Otherwise, you quickly run into the limitations of the Red Hat Universal Base Image, which is designed for distributing server-side applications, and not so much for persistent interactive CLI environments. For example, you won’t enjoy hacking on GNOME components like GTK, Settings, Shell or WebKitGTK in it because of the sheer amount of missing dependencies.

Today is glorious

You can now have gratis, self-supported Red Hat Enterprise Linux Toolbx containers on Fedora hosts that have access to the entire set of RHEL software, beyond the limited subset offered by UBI.

As always, you need to join the Red Hat Developer Program. Make sure that your account has simple content access enabled, by logging into access.redhat.com and then clicking the subscriptions link at the very top left.

Install subscription-manager on your Fedora host to register it with your Red Hat Developer Program account, create a RHEL Toolbx container as before, and you are off to the races!

Thanks to Pino Toscano for helping me iron out all the wrinkles in the pipeline to get everything working smoothly.

Written by Debarshi Ray

25 August, 2023 at 21:41

Posted in Blogroll

Toolbx — running the same host binary on Arch Linux, Fedora, Ubuntu, etc. containers

with 2 comments

This is a deep dive into some of the technical details of Toolbx and is a continuation from the earlier post about bypassing the immutability of OCI containers.

The problem

As we saw earlier, Toolbx uses a special entry point for its containers. It’s the toolbox executable itself.

$ podman inspect --format "{{.Config.Cmd}}" --type container fedora-toolbox-36
toolbox --log-level debug init-container ...

This is achieved by bind mounting the toolbox executable invoked by the user on the hosts to /usr/bin/toolbox inside the containers. While this has some advantages, it opens the door to one big problem. It means that executables from newer or different host operating systems might be running against older or different run-time environments inside the containers. For example, an executable from a Fedora 36 host might be running inside a Fedora 35 Toolbx, or one from an Arch Linux host inside an Ubuntu container.

This is very unusual. We only expect executables from an older version of an OS to keep working on newer versions of the same OS, but never the other way round, and definitely not across different OSes.

When binaries are compiled and linked against newer run-time environments, they may start relying on symbols (ie., non-static global variables, functions, class and struct members, etc.) that are missing in older environments. For example, glibc-2.32 (used in Fedora 33 onwards) added a new version of the pthread_sigmask symbol. If toolbox binaries built and linked against glibc-2.32 are run against older glibc versions, then they will refuse to start.

$ objdump -T /usr/bin/toolbox | grep GLIBC_2.32
0000000000000000      DO *UND*        0000000000000000  GLIBC_2.32  pthread_sigmask

This means that one couldn’t use Fedora 32 Toolbx containers on Fedora 33 hosts, or similarly any containers with glibc older than 2.32 on hosts with newer glibc versions. That’s quite the bummer.

If the executables are not ELF binaries, but carefully written POSIX shell scripts, then this problem goes away. Incidentally, Toolbx used to be implemented in POSIX shell, until it was re-written in Go two years ago, which is how it managed to avoid this problem for a while.

Fortunately, Go binaries are largely statically linked, with the notable exception of the standard C library. The scope of the problem would be much bigger if it involved several other dynamic libraries, like in the case of C or C++ programs.

Potential options

In theory, the easiest solution is to build the toolbox binary against the oldest supported run-time environment so that it doesn’t rely on newer symbols. However, it’s easier said than done.

Usually downstream distributors use build environments that are composed of components that are part of that specific version of the distribution. For example, it will be unusual for an RPM for a certain Fedora version to be deliberately built against a run-time from an older Fedora. Carlos O’Donell had an interesting idea on how to implement this in Fedora by only ever building for the oldest supported branch, adding a noautobuild file to disable the mass rebuild automation, and having newer branches always inherit the builds from the oldest one. However, this won’t work either. Building against the oldest supported Fedora won’t be enough for Fedora’s Toolbx because, by definition, Toolbx is meant to run different kinds of containers on hosts. The oldest supported Fedora hosts might still be too new compared to containers of supported Debian, Red Hat Enterprise Linux, Ubuntu etc. versions.

So, yes, in theory, this is the easiest solution, but, in practice, it requires a non-trivial amount of cross-distribution collaboration, and downstream build system and release engineering effort.

The second option is to have Toolbx containers provide their own toolbox binary that’s compatible with the run-time environment of the container. This would substantially complicate the communication between the toolbox binaries on the hosts and the ones inside the containers, because the binaries on the hosts and containers will no longer be exactly the same. The communication channel between commands like toolbox create and toolbox enter running on the hosts, and toolbox init-container inside the containers can no longer use a private and unstable interface that can be easily modified as necessary. Instead, it would have complicated backwards and forwards compatibility requirements. Other than that, it would complicate bug reports, and every single container on a host may need to be updated separately to fix bugs, with updates needing to be co-ordinated across downstream distributors.

The next option is to either statically link against the standard C library, or disable its use in Go. However, that would prevent us from using glibc’s Name Service Switch to look up usernames and groups, or to resolve host names. The replacement code, written in pure Go, can’t handle enterprise set-ups involving Network Information Service and Lightweight Directory Access Protocol, nor can it talk to host OS services like SSSD, systemd-userdbd or systemd-resolved.

It’s true that Toolbx currently doesn’t support enterprise set-ups with NIS and LDAP, but not using NSS will only make it more difficult to add that support in future. Similarly, we don’t resolve any host names at the moment, but given that we are in the business of pulling content over the network, it can easily become necessary in the future. Disabling the use of NSS will leave the toolbox binary as this odd thing that behaves differently from the rest of the OS for some fundamental operations.

An extension of the previous option is to split the toolbox executable into two. One dynamically linked against the standard C library for the hosts, and another that has no dynamic linkage to run inside the containers as their entry point. This can impact backwards compatibility and affect the developer experience of hacking on Toolbx.

Existing Toolbx containers want to bind mount the toolbox executable from the host to /usr/bin/toolbox inside the containers and run toolbox init-container as their entry point. This can’t be changed because of the immutability of OCI containers, and Toolbx simply can’t afford to break existing containers in a way where they can no longer be entered. This means that the toolbox executable needs to become a shim, without any dynamic linkage, that forwards the invocation to the right executable depending on whether it’s running on the hosts or inside the containers.

That brings us to the developer experience of hacking on Toolbx. The first thing note is that we don’t to go back to using POSIX shell to implement the executable that’s meant to run inside the container. Ondřej spent a lot of effort replacing the POSIX shell implementation of Toolbx, and we don’t want to undo any part of that. Ideally, we would use the same programming language (ie., Go) to implement both executables so that one doesn’t need to learn multiple disparate languages to work on Toolbx. However, even if we do use Go, we would have to be careful not to share code across the two executables, or be aware that they may have subtle differences in behaviour depending on how they might be linked.

Then there’s the developer experience of hacking on Toolbx on Fedora Silverblue and similar OSTree-based OSes, which is what you would do to eat your own dog food. Experiences are always subjective and this one is unique to hacking Toolbx inside a Toolbx. So let’s take a moment to understand the situation.

On OSTree-based OSes, Toolbx containers are used for development, and, generally speaking, it’s better to use container-specific locations invisible to the host as the development prefixes because the generated executables are specific to each container. Executables built on one container may not work on another, and not on the hosts either, because of the run-time problems mentioned above. Plus, it’s good hygiene not to pollute the hosts.

Similar to Flatpak and Podman, Toolbx is a tool that sets up containers. This means that unlike most other executables, toolbox must be on the hosts because, barring the init-container command, it can’t work inside the containers. The easiest way to do this, is to have a separate terminal emulator with a host shell, and invoke toolbox directly from Meson’s build directory in $HOME that’s shared between the hosts and the Toolbx containers, instead of installing toolbox to the container-specific development prefixes. Note that this only works because toolbox has always been implemented in programming languages with none to minimal dynamic linking, and only if you ensure that the Toolbx containers for hacking on Toolbx matches the hosts. Otherwise, you might run into the run-time problems mentioned above.

The moment there is one executable invoking another, the executables need to be carefully placed on the file system so that one can find the other one. This means that either the executables need to be installed into development prefixes or that the shim should have special logic to work out the location of the other binary when invoked directly from Meson’s build directory.

The former is a problem because the development prefixes will likely default to container-specific locations invisible from the hosts, preventing the built executables from being trivially invoked from the host. One could have a separate development prefix only for Toolbx that’s shared between the containers and the hosts. However, I suspect that a lot of existing and potential Toolbx contributors would find that irksome. They either don’t know or want to set up a prefix manually, but instead use something like jhbuild to do it for them.

The latter requires two different sets of logic depending on whether the shim was invoked directly from Meson’s build directory or from a development prefix. At the very least this would involve locating the second executable from the shim, but could grow into other areas as well. These separate code paths would be crucial enough that they would need to be thoroughly tested. Otherwise, Toolbx hackers and users won’t share the same reality. We could start by running our test suite in both modes, and then meticulously increase coverage, but that would come at the cost of a lengthier test suite.

Failed attempts

Since glibc uses symbol versioning, it’s sometimes possible to use some .symver hackery to avoid linking against newer symbols even when building against a newer glibc. This is what Toolbox used to do to ensure that binaries built against newer glibc versions still ran against older ones. However, this doesn’t defend against changes to the start-up code in glibc, like the one in glibc-2.34 that performed some security hardening.

Current solution

Alexander Larsson and Ray Strode pointed out that all non-ancient Toolbx containers have access to the hosts’ /usr at /run/host/usr. In other words, Toolbx containers have access to the host run-time environments. So, we decided to ensure that toolbox binaries always run against the host run-time environments.

The toolbox binary has a rpath pointing to the hosts’ libc.so somewhere under /run/host/usr and it’s dynamic linker (ie., PT_INTERP) is changed to the one inside /run/host/usr. Unfortunately, there can only be one PT_INTERP entry inside the binary, so there must be a /run/host on the hosts too for the binary to work on the hosts. Therefore, a /run/host symbolic link is also created on the host pointing to the hosts’ /.

The toolbox binary now looks like this, both on the hosts and inside the Toolbx containers:

$ ldd /usr/bin/toolbox
    linux-vdso.so.1 (0x00007ffea01f6000)
    libc.so.6 => /run/host/usr/lib64/libc.so.6 (0x00007f6bf1c00000)
    /run/host/usr/lib64/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2 (0x00007f6bf289a000)

It’s been almost a year and thus far this approach has held its own. I am mildly bothered by the presence of the /run/host symbolic link on the hosts, but not enough to lose sleep over it.

Other options

Recently, Robert McQueen brought up the idea of possibly using the Linux kernel’s binfmt_misc mechanism to modify the toolbox binary on the fly. I haven’t explored this in any seriousness, but maybe I will if the current set-up doesn’t work out.

Written by Debarshi Ray

2 October, 2022 at 19:28

Toolbx @ Community Central

leave a comment »

At 15:00 UTC today, I will be talking about Toolbx on a new episode of Community Central. It will be broadcast live on BlueJeans Events (formerly Primetime) and the recording will be available on YouTube. I am looking forward to seeing some friendly faces in the audience.

Written by Debarshi Ray

4 August, 2022 at 11:38

Toolbx — bypassing the immutability of OCI containers

leave a comment »

This is a deep dive into some of the technical details of Toolbx. I find myself regularly explaining them to various people, so I thought that I should write them down. Feel free to read and comment, or you can also happily ignore it.

The problem

OCI containers are famous for being immutable. Once a container has been created with podman create, it’s attributes can’t be changed anymore. For example, the bind mounts, the environment variables, the namespaces being used, and all the other attributes that can be specified via options to the podman create command. This means that once there’s a Toolbx, it wouldn’t be possible to give it access to a new set of files from the host if the need arose. The Toolbx would have to be deleted and re-created with access to the new paths.

This is a problem, because a Toolbx is where the user sets up her development and troubleshooting environment. Re-creating a Toolbx might mean reinstalling a number of different packages, tweaking configuration files, redeploying various artifacts and so on. Having to repeat all that in the middle of a long hacking session, just because the container’s attributes need to be tweaked, can be annoying.

This is unlike Flatpak containers, where it’s possible to override the permissions of a Flatpak either persistently through flatpak override or temporarily during flatpak run.

Secondly, as the Toolbx code evolves, we want to be able to transparently update existing Toolbxes to enable new features and fix bugs. It would be a real drag if users had to consciously re-create their containers.

The solution

Toolbx bypasses this by using a special entry point for the container. Those inquisitive types who have run podman inspect on a Toolbx container might have noticed that the toolbox executable itself is the container’s entry point.

$ podman inspect --format "{{.Config.Cmd}}" --type container fedora-toolbox-36
toolbox --log-level debug init-container ...

This means that when Toolbx starts a container using podman start, the toolbox init-container command gets run as the first process inside the container. Only after this has run, does the user’s interactive shell get spawned.

Instead of setting up the container entirely through podman create, Toolbx tries to use this reflexive entry point as much as possible. For example, Toolbx doesn’t use podman create --volume /tmp:/tmp to give access to the host’s /tmp inside the container. It bind mounts the entire root filesystem from the host at /run/host in the container with podman create --volume /:/run/host. Then, later when the container is started, toolbox init-container recursively bind mounts the container’s /run/host/tmp to /tmp. Since the container has its own mount namespace, the /run/host and /tmp bind mounts are neatly hidden away from the host.

Therefore, if in future additional host locations need to be exposed within the Toolbx, then those can be added to toolbox init-container, and once the user restarts the container after updating the toolbox executable, the new locations will show up inside the existing container. Similarly, if the mount parameters of an existing location need to be tweaked, or if a host location needs to be removed from the container.

This is not restricted to just bind mounts from the host. The same approach with toolbox init-container is used to configure as many different aspects of the container as possible. For example, setting up users, keeping the timezone and DNS configuration synchronized with the host, and so on.

Further details

One might wonder how a Toolbx container manages to have a toolbox executable inside it, especially since the toolbox package is not installed within the container. It is achieved by bind mounting the toolbox executable invoked by the user on the host to /usr/bin/toolbox inside the container.

This has some advantages.

There is always only one version of the toolbox executable that’s involved — the one that’s on the host. This means that the exact invocation of toolbox init-container, which is baked into the Toolbx and shows up in podman inspect, is the only interface that needs to be kept stable as the Toolbx code evolves. As long as toolbox init-container can be invoked with that specific command line, everything else can be changed because it’s the same executable on both the host and inside the container.

If the container had a separate toolbox package in it, then the user might have to separately update another executable to get the expected results, and we would have to ensure that different mismatched versions of the executable can work with each other across the host and the container. With a growing number of containers, the former would be a nightmare for the user, while the latter would be almost impossible to test.

Finally, having only one version of the toolbox executable makes it a lot easier for users to file bug reports. There’s only one version to report, not several spread across different environments.

This leads to another problem

Once you let this sink in, you might realize that bind mounting the toolbox executable from the host into the Toolbx means that an executable from a newer or different operating system might be running against an older or different run-time environment inside the container. For example, an executable from a Fedora 36 host might be running inside a Fedora 35 Toolbx, or one from an Arch Linux host inside an Ubuntu container.

This is very unusual. We only expect executables from an older version of an OS to keep working on newer versions of the same OS, but never the other way round, and definitely not across different OSes.

I will leave you with that thought and let you puzzle over it, because it will be the topic of a future post.

Written by Debarshi Ray

22 July, 2022 at 18:48

Toolbx is now on Matrix

leave a comment »

Toolbx now has its own room on matrix.org. Point your Matrix clients to #toolbx:matrix.org and join the conversation.

We are working on setting up an IRC bridge with Libera.Chat but that will take a few more months as we go through the process to register our project channel.

Written by Debarshi Ray

24 November, 2021 at 13:27

Toolbx: Red Hat is hiring a software engineer

leave a comment »

The Desktop Team at Red Hat wants to hire a software engineer to work full-time on Toolbx (formerly known as Toolbox) with me, and hopefully go on to maintain it in the near future. You will be working upstream and downstream (Fedora and RHEL) to improve the developer and troubleshooting experience on OSTree-based Linux operating systems like Fedora Silverblue and CoreOS, and extend some of the benefits to even traditional package-based OSes like Fedora Workstation.

If you are excited to work across the different layers of a modern Linux operating system, with a focus on container and desktop technologies, and aren’t afraid of getting your hands dirty with C and Go, then please go ahead and apply. Toolbx is a relatively young project with a rapidly growing community, so you are sure to have a fun ride.

Written by Debarshi Ray

15 November, 2021 at 14:46

Toolbox is now Toolbx

leave a comment »

Toolbox is being renamed to Container Toolbx or just Toolbx.

I had always been uncomfortable by the generic nature of the term toolbox and people keep complaining that it’s terribly difficult to search for. Recently, we have been trying to improve the online presence of the project by creating a website and a Twitter handle, and it’s impossible to find any decent Internet real estate with anything toolbox.

It looks like dropping the penultimate character from words to form names is a thing these days, hence Toolbx.

We haven’t yet renamed the Git repository or anything in the code or the binary or the manuals. Renaming the binary, for example, has implications for existing containers, and we don’t want to cause any needless disruption for users. So, those will gradually happen over time with all the necessary compatibility aliases and such.

Meanwhile, Fedora Magazine has published an interview with yours truly about Toolbx that talks about the history, latest improvements, future direction, and various other aspects of the project.

It should be obvious, but the Toolbx website was made by Jakub Steiner.

Written by Debarshi Ray

10 November, 2021 at 12:20

Toolbox — after a gap of 15 months

with 4 comments

toolbox-logo-landscape

We just released version 0.0.99, and I realized that it’s been a while since I blogged about Toolbox. So it’s time to address that.

Rewritten in Go

About a year ago, Ondřej Míchal single-handedly rewrote Toolbox in Go, making it massively easier to work on the code compared to the previous POSIX shell implementation. Go comes with much nicer facilities for command line parsing, error handling, logging, parsing JSON, and in general is a lot more pleasant to program in. Plus all the container tools in the OCI ecosystem are written in Go anyway, so it was a natural fit.

Other than the obvious benefits of Go, the rewrite immediately fixed a few bugs that were inherently very cumbersome to fix in the POSIX shell implementation. Something as simple as offering a –version option, or avoiding duplicate entries when listing containers or images was surprisingly difficult to achieve in the past.

What’s more, we managed to pull this off by retaining full compatibility with the previous code. So users and distributors should have no hesitation to update.

Towards version 0.1.0

We have been very conservative about our versioning scheme so far due to the inherently prototype nature of Toolbox. All our release numbers have followed the 0.0.x format. We thought that the move to Go deserves at least a minor version bump, but we also wanted to give it some time to shake out any bugs that might have crept in; and implement the features and fix the bugs that have been on our short-term wish list before putting a 0.1.0 stamp on it.

Therefore, we started a series of 0.0.9x releases to work our way towards version 0.1.0. The first one was 0.0.90 which shipped the Go code in March 2020, and we are currently at 0.0.99. Suffice to say that we are very close to the objective.

Rootful Toolboxes

Sometimes a rootless OCI container just isn’t enough because it can’t do things that require privilege escalation beyond the user’s current user ID on the host. This means that various debugging tools, such as Nmap, don’t work.

Therefore, we added support for running toolbox as root in version 0.0.98.1. This should hopefully unlock various new use-cases that were so far not possible when running rootless.

When running as root, Toolbox cannot rely on things like the user’s session D-Bus instance or the XDG_RUNTIME_DIR environment variable, because sudo doesn’t create a full-fledged user session that offers them. This means that graphical applications can only work by connecting to a X11 server, but then again running graphical applications as root is never a good idea to begin with.

Red Hat Universal Base Image (or UBI)

We recently took the first step towards supporting operating system distributions other than Fedora as first class citizens. From version 0.0.99 onwards, Toolbox supports Red Hat Enterprise Linux hosts where it will create containers based on the Red Hat Universal Base Image by default.

On hosts that aren’t running RHEL, one can still create UBI containers as:
$ toolbox create --distro rhel --release 8.3

Read more

Those were some of the big things that have happened in Toolbox land since my last update. If you are interested in more details, then you can read Ondřej’s posts where he writes at length about the port to Go and the changes in each of the releases since then.

Written by Debarshi Ray

14 January, 2021 at 22:49