Debarshi's den

Google Drive and GNOME — what is a volatile path?

with 6 comments

TL;DR: If you are using any GFile API that can create a new file or directory, then please take care of “volatile” paths. Look at standard::is-volatile, g_file_output_stream_query_info and g_file_query_info. Read further if you don’t trust me, or if you develop file management software.

Like other cloud storages, Drive is database-based. Every file or folder is identified by an opaque, server generated, immutable, blob (say “0B9J_DkziBDRFTj1TRWLPam9kbnc”), and has a human-readable title that can be displayed in the user interface (say “Summer Vacation”). This is unlike POSIX filesystems where a file is identified by its path and, barring encoding issues, the basename of that path is its name. ie., the name of the file “/home/alice/Summer\ Vacation” is “Summer Vacation”. Sounds like an innocuous distinction at first, but this is at the heart of the issue.

Let’s look at a few operations to get a feel for this.

Rename

POSIX:

mv /home/alice/foo /home/alice/bar

We need the paths to carry out the operation and once successfully finished, the file’s path changes. Since files are identified by their paths, the operation would fail if there existed another file called “bar” in the same location.

Google Drive:

set_title (id_of_foo, bar)

We only need the identifier (or ID) and the new title for the operation and once successfully finished, the title changes but the ID remains the same. Since IDs are unique and immutable, you can have two files with the same title in the same location. That’s a bit strange, you might think, but not so much.

Creating a New File

POSIX:

touch /home/alice/foo

If you want a file named “foo” in a certain location, barring encoding issues, you ask for a path that has “foo” appended to the path of the location. It will work as long as there isn’t another “foo” in the same location.

Google Drive:

create_file (id_of_parent_folder, foo) → id_of_new_file

Notice that we cannot specify the ID of the new file. We provide a title and the server generates the ID for it. Again, there is no restriction on having two files with the same title in the same location. This is where it starts getting weird.

GIO and GVfs

The file handling APIs in GIO are based on the POSIX model, with some room to accommodate minor diversions. They are ill-equipped to deal with a case like this.

As a quick aside, both MTP and PTP are somewhat similar in this regard. Both use IDs to refer to objects (think of them as files) stored on the phone or camera, and we have backends implementing them in GVfs. These backends construct POSIX-like paths using the human-readable titles of the files, and have an internal cache mapping IDs to paths and vice-versa. Easy enough, but there are two important differences:

  • MTP allows no parallelism. You can only perform one operation at a time — a new one won’t start until the earlier has finished. So, the GVfs backend can assume complete control over the storage. Google Drive, on the other hand, is massively parallel. You might be modifying it via GVfs, via a web browser, from a different computer, or someone might be sharing something to you from yet another computer. It is better to rely on the server generated IDs to identify files as much as possible, and let the server worry about keeping the data consistent.
  • MTP and PTP are usually backed by a traditional filesystem on the device. eg., FAT12, FAT16, FAT32, exFAT or ext3. Therefore the issue of duplicate titles doesn’t arise.

So, how do we deal with this?

Assuming that we are only doing read-only operations, the URIs used by GVfs’ Google Drive backend look like:

google-drive://foo@gmail.com/0B9J_DkziBDRFTj1TPam9kbnc/0B9J_GlyfBDRBHj1TRWL9khrc

The backend has its own scheme, the account is mentioned, and the path is made up of identifiers. Quite easy. The server doesn’t care about having a POSIX-like path, but we create one to fit into GIO’s POSIX-like world view.

These IDs are decidedly ugly and unreadable, so we use the standard::display-name attribute to tag them with the titles. If you run g_file_query_info on one of these URIs, you will get the ID as the standard::name and the title as the standard::display-name. Thus, the user won’t encounter any ugliness unless she deliberately went looking for it.

It gets complicated when we go beyond read-only access.

Let’s look at creating new files again. Since the GIO and GVfs APIs are designed around the POSIX model, a GVfs backend receives the request as:

create ("/id1/id2/some-title")

Even though the application thinks that it has created a new file named “some-title” inside “/id1/id2” that can be accessed by the path “/id1/id2/some-title”, the new file’s identifier is not “some-title”. It is whatever the server generated for us, say “id3”. If the application has to carry on the illusion of accessing the file as “/id1/id2/some-title”, then the backend has to map this “fake” path back to “/id1/id2/id3”; and this has to happen somewhat recursively because if we are copying a directory, then multiple elements in the file path can be “fake”. eg., “/id1/id2/title-of-subdir/title-of-file”, and so on.

We call these volatile paths. These are identified by the standard::is-volatile attribute and you can use standard::symlink-target to find the actual ID that it maps to. It is recommended that applications map volatile paths to the real IDs as soon as possible because the mapping can break if a parallel operation changes the title of the file.

In the specific case where you have created a new file and writing to it through a GFileOutputStream, then it is better to use g_file_output_stream_query_info instead of g_file_query_info. The output stream knows exactly which file it is writing to, and doesn’t have to do the mapping based on the volatile path, which could fail if the contents of the Drive changed out of band.

For working examples, see Nautilus.

Written by Debarshi Ray

13 September, 2015 at 01:21

6 Responses

Subscribe to comments with RSS.

  1. Wow, this is an excellent explanation of what standard::is-volatile means. Thank you! Is this something that application developers need to take into account in general if an application creates a new file that the user chooses via a file chooser, since the user may choose to create the file on Google Drive? Or does it only apply if you write file management software?

    Would you consider also adding an explanation about how API users should deal with is-volatile in the API documentation? Personal blogs are really good for in-depth explanations like this but unfortunately they aren’t very discoverable for (often first-time) users of the API documentation.

    Philip Chimento

    13 September, 2015 at 19:41

    • I am glad that you liked it. If you are using anything like g_file_make_directory, g_file_copy, g_file_create, g_file_replace, etc. that can end up creating a new file, you should use standard::is-volatile to resolve it to the real path. However, unless someone is modifying the Drive from a web browser or another computer, things will still work if the application doesn’t resolve the volatile paths because the GVfs backend will keep track of the mapping.

      The reason I called out “file management software” is because things like Nautilus and the GtkFileChooserWidget implementations tend to stress the GIO file access APIs the most, and can have visibly broken behaviour if they don’t take care of this. For example, when you create a new file, Nautilus would normally know about the volatile path, while the GFileMonitor watching the directory would report the “real” path. This would mislead Nautilus to show two new files – a file and a symlink pointing to the file. Also, the fact that the path can stay unchanged even after a set_display_name call was unexpected. These are things that really should be fixed.

      Regarding the GIO documentation … There is already some terse text out there. I need to think of a way to make it more understandable without making it too verbose.

      Debarshi Ray

      14 September, 2015 at 08:21

  2. […] Trotz Android wurde die Linux-Welt bislang nicht wirklich von Google geliebt, immerhin gibt es nun ab Version 3.18 von Gnome den Zugriff auf Daten von Google Drive. Ab sofort ist es Dateimanagern und anderen Anwendungen möglich auf einen Google Drive-Speicher zugreifen zu können, worauf mit Sicherheit viele Nutzer gewartet hatten. Allerdings müssen Entwickler wohl ein paar Besonderheiten beachten, um den Zugriff zu ermöglichen. Mehr dazu findet man laut den Kollegen hier. […]

  3. Hi Debarshi, thank you for the great effort!
    Only a question: using Google Drive backend on Gnome 3.18 with my Google account, I noticed that many files that I previously deleted (some month ago) are still there. Is this a correct behaviour for the backend? In the fututre these files will be removed from the view?

    Thank you!
    Paolo

    Paolo Leoni

    13 November, 2015 at 15:17

    • I would suggest filing a bug against gvfs on bugzilla.gnome.org with some details. Were those files that were shared by someone else? Are they still in your trash on Google Drive? Etc..

      Debarshi Ray

      9 April, 2016 at 01:28


Leave a comment