Google Drive and GNOME — what is a volatile path?
TL;DR: If you are using any GFile API that can create a new file or directory, then please take care of “volatile” paths. Look at standard::is-volatile, g_file_output_stream_query_info and g_file_query_info. Read further if you don’t trust me, or if you develop file management software.
Like other cloud storages, Drive is database-based. Every file or folder is identified by an opaque, server generated, immutable, blob (say “0B9J_DkziBDRFTj1TRWLPam9kbnc”), and has a human-readable title that can be displayed in the user interface (say “Summer Vacation”). This is unlike POSIX filesystems where a file is identified by its path and, barring encoding issues, the basename of that path is its name. ie., the name of the file “/home/alice/Summer\ Vacation” is “Summer Vacation”. Sounds like an innocuous distinction at first, but this is at the heart of the issue.
Let’s look at a few operations to get a feel for this.
mv /home/alice/foo /home/alice/bar
We need the paths to carry out the operation and once successfully finished, the file’s path changes. Since files are identified by their paths, the operation would fail if there existed another file called “bar” in the same location.
set_title (id_of_foo, bar)
We only need the identifier (or ID) and the new title for the operation and once successfully finished, the title changes but the ID remains the same. Since IDs are unique and immutable, you can have two files with the same title in the same location. That’s a bit strange, you might think, but not so much.
Creating a New File
If you want a file named “foo” in a certain location, barring encoding issues, you ask for a path that has “foo” appended to the path of the location. It will work as long as there isn’t another “foo” in the same location.
create_file (id_of_parent_folder, foo) → id_of_new_file
Notice that we cannot specify the ID of the new file. We provide a title and the server generates the ID for it. Again, there is no restriction on having two files with the same title in the same location. This is where it starts getting weird.
GIO and GVfs
The file handling APIs in GIO are based on the POSIX model, with some room to accommodate minor diversions. They are ill-equipped to deal with a case like this.
As a quick aside, both MTP and PTP are somewhat similar in this regard. Both use IDs to refer to objects (think of them as files) stored on the phone or camera, and we have backends implementing them in GVfs. These backends construct POSIX-like paths using the human-readable titles of the files, and have an internal cache mapping IDs to paths and vice-versa. Easy enough, but there are two important differences:
- MTP allows no parallelism. You can only perform one operation at a time — a new one won’t start until the earlier has finished. So, the GVfs backend can assume complete control over the storage. Google Drive, on the other hand, is massively parallel. You might be modifying it via GVfs, via a web browser, from a different computer, or someone might be sharing something to you from yet another computer. It is better to rely on the server generated IDs to identify files as much as possible, and let the server worry about keeping the data consistent.
- MTP and PTP are usually backed by a traditional filesystem on the device. eg., FAT12, FAT16, FAT32, exFAT or ext3. Therefore the issue of duplicate titles doesn’t arise.
So, how do we deal with this?
Assuming that we are only doing read-only operations, the URIs used by GVfs’ Google Drive backend look like:
The backend has its own scheme, the account is mentioned, and the path is made up of identifiers. Quite easy. The server doesn’t care about having a POSIX-like path, but we create one to fit into GIO’s POSIX-like world view.
These IDs are decidedly ugly and unreadable, so we use the standard::display-name attribute to tag them with the titles. If you run g_file_query_info on one of these URIs, you will get the ID as the standard::name and the title as the standard::display-name. Thus, the user won’t encounter any ugliness unless she deliberately went looking for it.
It gets complicated when we go beyond read-only access.
Let’s look at creating new files again. Since the GIO and GVfs APIs are designed around the POSIX model, a GVfs backend receives the request as:
Even though the application thinks that it has created a new file named “some-title” inside “/id1/id2” that can be accessed by the path “/id1/id2/some-title”, the new file’s identifier is not “some-title”. It is whatever the server generated for us, say “id3”. If the application has to carry on the illusion of accessing the file as “/id1/id2/some-title”, then the backend has to map this “fake” path back to “/id1/id2/id3”; and this has to happen somewhat recursively because if we are copying a directory, then multiple elements in the file path can be “fake”. eg., “/id1/id2/title-of-subdir/title-of-file”, and so on.
We call these volatile paths. These are identified by the standard::is-volatile attribute and you can use standard::symlink-target to find the actual ID that it maps to. It is recommended that applications map volatile paths to the real IDs as soon as possible because the mapping can break if a parallel operation changes the title of the file.
In the specific case where you have created a new file and writing to it through a GFileOutputStream, then it is better to use g_file_output_stream_query_info instead of g_file_query_info. The output stream knows exactly which file it is writing to, and doesn’t have to do the mapping based on the volatile path, which could fail if the contents of the Drive changed out of band.