Debarshi's den

Toolbx @ Community Central

with one comment

At 15:00 UTC today, I will be talking about Toolbx on a new episode of Community Central. It will be broadcast live on BlueJeans Events (formerly Primetime) and the recording will be available on YouTube. I am looking forward to seeing some friendly faces in the audience.

Written by Debarshi Ray

4 August, 2022 at 11:38

Toolbx — bypassing the immutability of OCI containers

leave a comment »

This is a deep dive into some of the technical details of Toolbx. I find myself regularly explaining them to various people, so I thought that I should write them down. Feel free to read and comment, or you can also happily ignore it.

The problem

OCI containers are famous for being immutable. Once a container has been created with podman create, it’s attributes can’t be changed anymore. For example, the bind mounts, the environment variables, the namespaces being used, and all the other attributes that can be specified via options to the podman create command. This means that once there’s a Toolbx, it wouldn’t be possible to give it access to a new set of files from the host if the need arose. The Toolbx would have to be deleted and re-created with access to the new paths.

This is a problem, because a Toolbx is where the user sets up her development and troubleshooting environment. Re-creating a Toolbx might mean reinstalling a number of different packages, tweaking configuration files, redeploying various artifacts and so on. Having to repeat all that in the middle of a long hacking session, just because the container’s attributes need to be tweaked, can be annoying.

This is unlike Flatpak containers, where it’s possible to override the permissions of a Flatpak either persistently through flatpak override or temporarily during flatpak run.

Secondly, as the Toolbx code evolves, we want to be able to transparently update existing Toolbxes to enable new features and fix bugs. It would be a real drag if users had to consciously re-create their containers.

The solution

Toolbx bypasses this by using a special entry point for the container. Those inquisitive types who have run podman inspect on a Toolbx container might have noticed that the toolbox executable itself is the container’s entry point.

$ podman inspect --format "{{.Config.Cmd}}" --type container fedora-toolbox-36
toolbox --log-level debug debug init-container ...

This means that when Toolbx starts a container using podman start, the toolbox init-container command gets run as the first process inside the container. Only after this has run, does the user’s interactive shell get spawned.

Instead of setting up the container entirely through podman create, Toolbx tries to use this reflexive entry point as much as possible. For example, Toolbx doesn’t use podman create --volume /tmp:/tmp to give access to the host’s /tmp inside the container. It bind mounts the entire root filesystem from the host at /run/host in the container with podman create --volume /:/run/host. Then, later when the container is started, toolbox init-container recursively bind mounts the container’s /run/host/tmp to /tmp. Since the container has its own mount namespace, the /run/host and /tmp bind mounts are neatly hidden away from the host.

Therefore, if in future additional host locations need to be exposed within the Toolbx, then those can be added to toolbox init-container, and once the user restarts the container after updating the toolbox executable, the new locations will show up inside the existing container. Similarly, if the mount parameters of an existing location need to be tweaked, or if a host location needs to be removed from the container.

This is not restricted to just bind mounts from the host. The same approach with toolbox init-container is used to configure as many different aspects of the container as possible. For example, setting up users, keeping the timezone and DNS configuration synchronized with the host, and so on.

Further details

One might wonder how a Toolbx container manages to have a toolbox executable inside it, especially since the toolbox package is not installed within the container. It is achieved by bind mounting the toolbox executable invoked by the user on the host to /usr/bin/toolbox inside the container.

This has some advantages.

There is always only one version of the toolbox executable that’s involved — the one that’s on the host. This means that the exact invocation of toolbox init-container, which is baked into the Toolbx and shows up in podman inspect, is the only interface that needs to be kept stable as the Toolbx code evolves. As long as toolbox init-container can be invoked with that specific command line, everything else can be changed because it’s the same executable on both the host and inside the container.

If the container had a separate toolbox package in it, then the user might have to separately update another executable to get the expected results, and we would have to ensure that different mismatched versions of the executable can work with each other across the host and the container. With a growing number of containers, the former would be a nightmare for the user, while the latter would be almost impossible to test.

Finally, having only one version of the toolbox executable makes it a lot easier for users to file bug reports. There’s only one version to report, not several spread across different environments.

This leads to another problem

Once you let this sink in, you might realize that bind mounting the toolbox executable from the host into the Toolbx means that an executable from a newer or different operating system might be running against an older or different run-time environment inside the container. For example, an executable from a Fedora 36 host might be running inside a Fedora 35 Toolbx, or one from an Arch Linux host inside an Ubuntu container.

This is very unusual. We only expect executables from an older version of an OS to keep working on newer versions of the same OS, but never the other way round, and definitely not across different OSes.

I will leave you with that thought and let you puzzle over it, because it will be the topic of a future post.

Written by Debarshi Ray

22 July, 2022 at 18:48

Toolbx is now on Matrix

leave a comment »

Toolbx now has its own room on matrix.org. Point your Matrix clients to #toolbx:matrix.org and join the conversation.

We are working on setting up an IRC bridge with Libera.Chat but that will take a few more months as we go through the process to register our project channel.

Written by Debarshi Ray

24 November, 2021 at 13:27

Toolbx: Red Hat is hiring a software engineer

leave a comment »

The Desktop Team at Red Hat wants to hire a software engineer to work full-time on Toolbx (formerly known as Toolbox) with me, and hopefully go on to maintain it in the near future. You will be working upstream and downstream (Fedora and RHEL) to improve the developer and troubleshooting experience on OSTree-based Linux operating systems like Fedora Silverblue and CoreOS, and extend some of the benefits to even traditional package-based OSes like Fedora Workstation.

If you are excited to work across the different layers of a modern Linux operating system, with a focus on container and desktop technologies, and aren’t afraid of getting your hands dirty with C and Go, then please go ahead and apply. Toolbx is a relatively young project with a rapidly growing community, so you are sure to have a fun ride.

Written by Debarshi Ray

15 November, 2021 at 14:46

Toolbox is now Toolbx

leave a comment »

Toolbox is being renamed to Container Toolbx or just Toolbx.

I had always been uncomfortable by the generic nature of the term toolbox and people keep complaining that it’s terribly difficult to search for. Recently, we have been trying to improve the online presence of the project by creating a website and a Twitter handle, and it’s impossible to find any decent Internet real estate with anything toolbox.

It looks like dropping the penultimate character from words to form names is a thing these days, hence Toolbx.

We haven’t yet renamed the Git repository or anything in the code or the binary or the manuals. Renaming the binary, for example, has implications for existing containers, and we don’t want to cause any needless disruption for users. So, those will gradually happen over time with all the necessary compatibility aliases and such.

Meanwhile, Fedora Magazine has published an interview with yours truly about Toolbx that talks about the history, latest improvements, future direction, and various other aspects of the project.

It should be obvious, but the Toolbx website was made by Jakub Steiner.

Written by Debarshi Ray

10 November, 2021 at 12:20

Toolbox — After a gap of 15 months

with 3 comments

toolbox-logo-landscape

We just released version 0.0.99, and I realized that it’s been a while since I blogged about Toolbox. So it’s time to address that.

Rewritten in Go

About a year ago, Ondřej Míchal single-handedly rewrote Toolbox in Go, making it massively easier to work on the code compared to the previous POSIX shell implementation. Go comes with much nicer facilities for command line parsing, error handling, logging, parsing JSON, and in general is a lot more pleasant to program in. Plus all the container tools in the OCI ecosystem are written in Go anyway, so it was a natural fit.

Other than the obvious benefits of Go, the rewrite immediately fixed a few bugs that were inherently very cumbersome to fix in the POSIX shell implementation. Something as simple as offering a –version option, or avoiding duplicate entries when listing containers or images was surprisingly difficult to achieve in the past.

What’s more, we managed to pull this off by retaining full compatibility with the previous code. So users and distributors should have no hesitation to update.

Towards version 0.1.0

We have been very conservative about our versioning scheme so far due to the inherently prototype nature of Toolbox. All our release numbers have followed the 0.0.x format. We thought that the move to Go deserves at least a minor version bump, but we also wanted to give it some time to shake out any bugs that might have crept in; and implement the features and fix the bugs that have been on our short-term wish list before putting a 0.1.0 stamp on it.

Therefore, we started a series of 0.0.9x releases to work our way towards version 0.1.0. The first one was 0.0.90 which shipped the Go code in March 2020, and we are currently at 0.0.99. Suffice to say that we are very close to the objective.

Rootful Toolboxes

Sometimes a rootless OCI container just isn’t enough because it can’t do things that require privilege escalation beyond the user’s current user ID on the host. This means that various debugging tools, such as Nmap, don’t work.

Therefore, we added support for running toolbox as root in version 0.0.98.1. This should hopefully unlock various new use-cases that were so far not possible when running rootless.

When running as root, Toolbox cannot rely on things like the user’s session D-Bus instance or the XDG_RUNTIME_DIR environment variable, because sudo doesn’t create a full-fledged user session that offers them. This means that graphical applications can only work by connecting to a X11 server, but then again running graphical applications as root is never a good idea to begin with.

Red Hat Universal Base Image (or UBI)

We recently took the first step towards supporting operating system distributions other than Fedora as first class citizens. From version 0.0.99 onwards, Toolbox supports Red Hat Enterprise Linux hosts where it will create containers based on the Red Hat Universal Base Image by default.

On hosts that aren’t running RHEL, one can still create UBI containers as:
$ toolbox create --distro rhel --release 8.3

Read more

Those were some of the big things that have happened in Toolbox land since my last update. If you are interested in more details, then you can read Ondřej’s posts where he writes at length about the port to Go and the changes in each of the releases since then.

Written by Debarshi Ray

14 January, 2021 at 22:49

Toolbox — A fall 2019 update

with 3 comments

toolbox-logo-landscape

Things have been moving fast in Toolbox land, and it’s time to talk about what we have been doing lately.

New home

Toolbox is now part of the containers organization on GitHub. We felt that the project had outgrown the prototype stage — going by the activity on the GitHub project it’s safe to say that there are at least a few thousand users who rely on it to get their work done; and we are increasingly working towards expanding the scope of the project to go beyond just setting up a development environment.

Housing the project in my personal GitHub namespace meant that I couldn’t share admin access with other contributors, and this was a problem we had to address as more and more people keep joining the project. Over the past year, we have developed a really good working relationship with the Podman team and other members of the containers organization, without whom Toolbox wouldn’t exist, so moving in under the same umbrella felt like a natural next step towards growing the project.

Migration to cgroups v2

Fedora 31 ships with cgroups v2 by default. The major blocker for cgroups v2 adoption so far was the lack of support in the various container and virtualization tools, including the Podman stack. Since Toolbox containers are just OCI containers managed with Podman, we saw some action too.

After updating the host operating system to Fedora 31, Toolbox will try to migrate your existing containers to work with cgroups v2. Sadly, this is a somewhat complicated move, and in theory it’s possible that the migration might break some containers depending on how they were configured. So far, as per our testing, it seems that containers created by Toolbox do get smoothly migrated, so hopefully you won’t notice.

However, if things go wrong, barring a delicate surgery on the container requiring some pretty arcane knowledge, your only option might be to do a factory reset of your local Podman installation. As factory resets go, you will lose all your existing OCI containers and images on your local system. This is a sad outcome for those unfortunate enough to encounter it. However, if you do find yourself in this quagmire then take a look at the toolbox reset command.

Note that you need to have podman-1.6.2 and toolbox-0.0.16 for the above to work.

Also, this is one of those changes where it bears repeating that online RPM package updates are fragile. They are officially unsupported on Fedora Workstation, and variants like CoreOS and Silverblue make it even harder. A cgroups v2 migration is only expected to work on a freshly booted system.

Improvements

The last six months have seen a whole boatload of new features and improvements. Here are some highlights.

On Fedora Silverblue and Workstation, GNOME Terminal keeps track of the current Toolbox container, and just like it preserves the current working directory when opening a new terminal, it’s also able to preserve the Toolbox environment. This is quite convenient when hacking on a Silverblue system, because it removes the extra step of entering a toolbox after opening a new tab or window.

The integration with the host operating system has been deepened. Toolbox containers can now access virtual machines managed by the host’s system libvirt instance, and the host’s ulimits are preserved. The entirety of /dev is made available inside the toolbox as a step towards supporting the proprietary Nvidia driver to enable CUDA for AI/ML frameworks like TensorFlow.

The container’s /run/host now has big chunks of the host’s file hierarchy. This is handy for one-off use-cases which require access to parts of the host that aren’t covered by Toolbox by default.

Last but not the least, Kerberos now works inside Toolbox containers. This will make it easier to contribute to Fedora itself from inside a toolbox.

Written by Debarshi Ray

1 November, 2019 at 21:53

Fedora Toolbox is now just Toolbox

with 3 comments

toolbox-logo-landscape

Fedora Toolbox has been renamed to just Toolbox. Even though the project is obviously driven by the needs of Fedora Silverblue and uses technologies like Buildah and Podman that are driven by members of the wider Fedora project, it was felt that a toolbox container is a generic concept that appeals to a lot many more communities than just Fedora. You can also think of it as a nod to coreos/toolbox which served as the original inspiration for the project, and there are plans to use it in Fedora CoreOS too.

If you’re curious, here’s a subset of the discussion that drove the renaming.

There have already been two releases with the new name, so I assume that almost all users have been migrated.

Note that the name of the base OCI image for creating Fedora toolbox containers is still fedora-toolbox for obvious namespacing reasons, but the names of the client-side command line tool, and the overall project itself have changed. That way you could have a debian-toolbox, a centos-toolbox and so on.

It should be obvious, but the Toolbox logo was designed and created by Jakub Steiner.

Written by Debarshi Ray

3 April, 2019 at 19:39

About -Wextra and -Wcast-function-type

leave a comment »

About eight months ago, around the time when GCC 8.x started showing up on my computers, I started moving my code away from using -Wextra. This aligns nicely with the move to the Meson build system, which is nice; but went against the flow of Autotools’ AX_COMPILER_FLAGS, which isn’t ideal but is an acceptable trade-off.

But why?

GCC 8.x added a warning called -Wcast-function-type to the -Wextra umbrella. It warns when a function pointer is cast to an incompatible function. At a glance, this seems desirable, but it isn’t. It runs contrary to one of the widely used C idioms in GNOME. For example, it’s triggered by this text book use of g_list_copy_deep to copy a list of reference counted objects:

another_list = g_list_copy_deep (list, (GCopyFunc) g_object_ref, NULL);

Note that this is different from -Wincompatible-pointer-types, which would’ve triggered if the cast to GCopyFunc was missing:

another_list = g_list_copy_deep (list, g_object_ref, NULL);

It’s easy to imagine similar examples with uses of gtk_container_forall or gtk_container_foreach with gtk_widget_destroy, and so on.

The C standard (eg., see article 6.5.2.2 of the C11 standard) steps around the issue of passing more arguments to a function than it actually has parameters for by calling it undefined behaviour. However, the calling conventions of all the platforms supported by GLib are defined in a way to make this work.

So how do we disable -Wcast-function-type?

One option is to use a compiler directive with #pragma as suggested by AX_COMPILER_FLAGS. However, attempts to ignore it through a #pragma on older versions of GCC that didn’t have this specific warning will trigger -Wpragmas, and, ironically, using G_GNUC_CHECK_VERSION to conditionally disable it on newer compilers will trigger -Wexpansion-to-defined, again, because of -Wextra and the fact that the implementation of the macro has a #ifdef around __GNUC__. Regardless, for a warning that gets triggered by such a widely used programming construct, it would lead to a ton of boilerplate all over the codebase, instead of being a solitary exception tucked away in one corner of the project.

Therefore, my preference has been to append -Wno-cast-function-type and -Wno-error=cast-function-type to the list of compiler flags of modules using -Wextra. This avoids almost all of the above problems. One small wrinkle is that if a translation unit or file does trigger some other unrelated diagnostic, then an older compiler will also emit -Wunknown-warning for the presence of the unknown -Wno-cast-function-type flag. I find this acceptable because, in the first place, a file shouldn’t trigger any other diagnostic, especially on an older compiler, and if there does happen to be something, say a deprecation warning, then it’s likely something that needs to be fixed anyway.

Given that this can repeat with future versions of GCC, it seems wiser to avoid -Wextra and instead explicitly list out the desired compiler warnings. This isn’t hard to do because the GCC documentation clearly marks which warnings are turned on by -Wextra, -Wpedantic, etc..

Written by Debarshi Ray

1 April, 2019 at 13:05

Posted in C, GNOME, GNU

GNOME Photos: an overview of zooming

with 2 comments

I was recently asked about how zooming works in GNOME Photos, and given that I spent an inordinate amount of time getting the details right, I thought I should write it down. Feel free to read and comment, or you can also happily ignore it.

Smooth zooming

One thing that I really wanted from the beginning was smooth zooming. When the user clicks one of the zoom buttons or presses a keyboard shortcut, the displayed image should smoothly flow in and out instead of jumping to the final zoom level — similar to the way the image smoothly shrinks in to make way for the palette when editing, and expands outwords once done. See this animated mock-up from Jimmac to get an idea.

For the zooming to be smooth, we need to generate a number of intermediate zoom levels to fill out the frames in the animation. We have to dish out something in the ballpark of sixty different levels every second to be perceived as smooth because that’s the rate at which most displays refresh their screens. This would have been easier with the 5 to 20 megapixel images generated by smart-phones and consumer-grade digital SLRs; but just because we want things to be sleek, it doesn’t mean we want to limit ourselves to the ordinary! There is high-end equipment out there producing images in excess of a hundred megapixels and we want to robustly handle those too.

Downscaling by large factors is tricky. When we are aiming to generate sixty frames per second, there’s less than 16.67 milliseconds for each intermediate zoom level. All we need is a slightly big zoom factor that stresses the CPU and main memory just enough to exceed our budget and break the animation. It’s a lot more likely to happen than a pathological case that crashes the process or brings the system to a halt.

Mipmaps to the rescue!

A 112.5 megapixel or 12500×9000 image being smoothly zoomed in and out on an Intel Kaby Lake i7 with a HiDPI display. At the given window size, the best fit zoom level is approximately 10%. On a LoDPI display it would’ve been 5%. Note that simultaneously encoding the screencast consumes enough extra resources to make it stutter a bit. That’s not the case otherwise.

Photos uses GEGL to deal with images, and image pixels are held in GeglBuffers. Each GeglBuffer implicitly supports 8 mipmap levels. In other words, a GeglBuffer not only has the image pixels at the original resolution, or level zero, at which they were fed into the buffer, but it also caches progressively lower resolution representations of it. For example, at 50% or level one, at 25% or level two, and so on.

This means that we never downscale by more than a factor of two during an animation. If we want to zoom an image down to 30%, we take the first mipmap level, which is already cached at 50%, and from there on it’s just another 60% to reach the originally intended zoom target of 30%. Knowing that we won’t ever have to downscale by more than a factor of two in a sensitive code path is a relief.

But that’s still not enough.

It doesn’t take long to realize that the user barely catches a fleeting glimpse of the intermediate zoom levels. So, we cut corners by using the fast but low quality nearest neighbour sampler for those; and only use a higher quality box or bilinear sampler, depending on the specific zoom level, for the final image that the user will actually see.

With this set-up in place, on the Intel Kaby Lake i7 machine used in the above video, it consistently takes less than 10 milliseconds for the intermediate frames, and less than 26 milliseconds for the final high quality frame. On an Intel Sandybridge i7 with a LoDPI display it takes less than 5 and 15 milliseconds respectively, because there are less pixels to pump. On average it’s a lot more faster than these worst case figures. You can measure for yourselves using the GNOME_PHOTOS_DEBUG environment variable.

A lot of the above was enabled by Øyvind Kolås’ work on GEGL. Donate to his fund-raiser if you want to see more of this.

There’s some work to do for the HiDPI case, but it’s already fast enough to be perceived as smooth by a human. Look at the PhotosImageView widget if you are further interested.

An elastic zoom gesture

While GTK already comes with a gesture for recognizing pinch-to-zoom, it doesn’t exactly match the way we handle keyboard, mouse and touch pad events for zooming. Specifically, I wanted the image to snap back to its best fit size if the user tried to downscale beyond it using a touch screen. You can’t do that with any other input device, so it makes sense that it shouldn’t be possible with a touch screen either. The rationale being that Photos is optimized for photographic content, which are best viewed at their best fit or natural sizes.

For this elastic behaviour to work, the semantics of how GtkGestureZoom calculates the zoom delta had to be reworked. Every time the direction of the fingers changed, the reference separation between the touch points relative to which the delta is computed must be reset to the current distance between them. Otherwise, if the fingers change direction after having moved past the snapping point, the image will abruptly jump instead of sticking to the fingers.

The image refuses to become smaller than the best fit zoom level and snaps back. Note that simultaneously encoding the screencast consumes enough extra resources to make it stutter a bit. That’s not the case otherwise.

With some help from Carlos Garnacho, we have a custom gesture that hooks into GtkGestureZoom’s begin and update signals to implement the above. The custom gesture is slightly awkward because GtkGestureZoom is a final class and can’t be derived, but it’s not too bad for a prototype. It’s called PhotosGestureZoom, in case you want to look it up.

The screencasts feature a 112.5 megapixel or 12500×9000 photo of hot air balloons at ClovisFest taken by Soulmates Photography / Daniel Street available under the Creative Commons Attribution-Share Alike 3.0 Unported license.

The touch points were recorded in an X session with a tool written by Carlos Garnacho.

Written by Debarshi Ray

8 February, 2019 at 18:36

Posted in C, Fedora, GEGL, GNOME, GTK+, Photos