5 May 2013

The SyncFileSystem

(Disclosure: I work for the Google Drive team, but not on anything related to sync, or on anything related to what’s discussed here.)

HTML5 has multiple APIs for apps to store data offline, sandboxed from other apps and from user data:

  • Web storage, which lets you store a persistent map of key-value pairs.

  • Indexed DB, which is an object database — it lets you store arbitrary JavaScript objects, which are indexed and therefore searchable.

  • The File API, which gives apps a hierarchical filesystem to store files in [1]. In Chrome, these map to files on the OS filesystem.

But, as the TechCrunch article describes, these are are all per-device. If you want to sync data across devices, you have to build your own server, perhaps using AWS, Google Compute Engine, or Microsoft Azure. These provide VMs, letting you build practically anything on top of their infrastructure [2].

Or you could use Google App Engine, which is far simpler (I was able to get a simpler server up and running in 45 minutes of coding), but more limited. For example, you don’t have to worry about scaling — more instances of your app are spun up automatically as needed. You don’t have to worry about datacenter downtime — apps are migrated between datacenters as needed. And so forth. App Engine is simple but limited. The limitations are being rolled back one after another as time goes by, but it’s still much more limited than VMs.

What could be even simpler? Why, not having a server at all! Your app would write to the store on one machine, and the changes would get synced over to the user’s other machines, without you having to build your own server.

iCloud did this first, with three APIs, which are per-app and sandboxed (as with the HTML5 APIs):

  • a hierarchical filesystem (similar to the HTML5 File API)

  • a key-value pair store (similar to Web Storage)

  • Core Data syncing, which syncs SQL databases between machines [3].

When iCloud came out, the second and third options were not commonly offered in sync tools like Dropbox, which offered only the first. Besides, it wasn’t sandboxed, and I’d never give an app access to my entire dropbox, with all my financial and other sensitive data. That is, I used Dropbox as a way to sync and backup my files, not as an API for apps.

Now, Chrome introduced a SyncFileSystem API. This is API-compatible with the HTML5 File API, as far as I can understand, but the files you write are synced to the user’s Chrome instances on other machines via Google Drive [4]. These files are not visible to the user, as with iCloud. But it’s seamless to users, and easy for developers to use, in the sense that they don’t have to build their own server.

This works better for web apps than the Dropbox API, because it works offline, while making JSON-RPC calls to the Dropbox server obviously doesn’t. Besides, local is faster than networked.

It works better than the iCloud API, again because the latter API isn’t available to web apps [5], only to native apps running under Apple-approved circumstances [6].

I was concerned when I first heard of the SyncFileSystem API, because denying the user access to their own files seemed scary. What if a future version of Microsoft Office refused to let you open your files in iWork or LibreOffice, or upload them to Google Docs?

Security didn’t be at the cost of lock-in. Sure, sandbox the filesystem, so that malicious, hacked or merely careless apps don’t end up snooping though other apps’ data. But give users an Open With option, or a way to copy or move the data out of the app’s sandbox, so that the sandbox doesn’t become a prison.

But, remember, not all data is presented to the user as files, and perhaps not even stored directly as files. A notes app may present the user with a list of notes, freeing him from dealing with files on the filesystem. And the notes may not be stored directly as files — they may be stored in an SQLite database.

Besides, if we’re worried about an app locking users in, the app can already do so, by building its own backend on App Engine or AWS. The SyncFileSystem is merely a simpler sync mechanism.

And a more limited one, which works for only some apps. When you don’t have a server, you lose a lot of control. For example, you don’t have a single source of truth anymore. Sync and conflict resolution must be distributed. That is, if a user of your app has three devices, it’s possible that two of them are in the process of syncing, when the third one comes in and starts syncing as well. With a server, you can sync one device at a time, to keep things simple, using a lock on the server.

You can also enforce invariants on the server, if you had one. For example, imagine a social network that has a friends list, and further lets you star a friend to mark them as a close friend. Let’s say you implement this using two files on the SyncFileSystem (or on iCloud or Dropbox, for that matter) — one for the friends’ list and one for the close friends’ list. Now, if the user adds a friend and immediately stars them, it’s possible the close friend list get synced first to another device, which ends up seeing a close friend who’s not a friend. This is an illegal state for the system. If you had a server, you could enforce this invariant, and make the client first add the friend and then mark the friend as a close friend, or do both at once, to keep the system in a consistent state.

The order of updates is also lost. So, for example, you can’t provide an undo function that restores everything to the state it was in at a certain time in the past. Google Contacts provides this, so if you messed up your contacts, perhaps via a sync that went awry, you can easily restore them to a good state.

The server-less model is also more prone to conflicts than a server-based one. Imagine adding a friend A to your friend list, and simultaneously removing a friend B, but on another device. With the server-based system, there’s no conflict — the server gets two requests, and executes them, and the clients are updated normally on their next sync. But this operation (add A on one device, and remove B on another) would cause a conflict in the SyncFileSystem/iCloud/Dropbox world. Once you apply the user’s action locally, adding or removing a person from the friend list, you can’t easily go back to intent, to ask “what was the user doing, anyway?” This works naturally in the server-based model.

As a final example of the drawbacks of the SyncFileSystem, imagine updating your sync logic. Now you have two versions of your app, running on different devices, that must all sync. There could even be N versions, if the user loads your web app in a browser and doesn’t refresh it (and it doesn’t auto-refresh). So now you have N versions of your app running on M devices, and god help you get this right. Whereas with the server model, a lot of your sync logic is in only one place, the server, so it can be updated with the assurance that the previous logic isn’t hanging around somewhere. Even if you have some sync logic on the client, the server can be explicitly coded to handle these — you don’t have a N^2 combinatorial explosion where any of N versions of your app can interact with any of N versions on another device. With a server, you can even force clients to update if they are running too old a version. That way, you don’t have to deal with all possible versions.

These are all limitations of the SyncFileSystem model. But there are also apps for which it works well, like a game that saves high scores. This can perhaps afford to be eventually consistent. As another example, imagine a cloud-based word processor (like the one in Google Docs, incidentally) that writes documents to the SyncFileSystem. Since each file is independent, sync is far easier.

So that’s the bottom line for the SyncFileSystem — just as building a server on App Engine is simpler than building it on AWS or Google Compute Engine, not building a server, and leaving sync to Chrome, is even simpler. But even more restricted. Just as you can’t run everything on App Engine that you can on AWS, you can’t use the SyncFileSystem for all apps. Simpler to use, if it works for you, but more restricted.

[1] There’s also a way to read and write user documents, with user permission, via a File Open / Save dialog box, or drag-and-drop. But this is not relevant to our discussion, which is about a sandboxed place for apps to store data, which may not even be presented to the user as files. For example, a notes app shows you a list of notes, without bothering you about the details of how they are stored. They could be in an SQLite database for all we know.

[2] These are a mixture of IaaS (Infrastructure as a Service), like VMs, and PaaS (Platform as a Service), like Dynamo DB. The “I” part let you build whatever you want, while the P part gives you powerful building blocks that may take you years to develop. In that sense, they are the best of both worlds, but they still have more overhead (to the programmer) than something like App Engine.

[3] … which doesn’t work, but that’s a different matter.

[4] This does mean the user has to have a Drive account and be signed in.

[5] You can’t make JSON-RPC or REST calls to the iCloud server to access your data, as you can with Dropbox or Google Drive. Instead, iCloud is an API implemented in Apple operating systems — OS X and iOS.

[6] iCloud APIs are available only on iOS and on OS X, but in the latter case, only if the app was downloaded through the Mac App Store. And some iCloud APIs are available on Windows. Whereas Chrome runs on OS X, Windows, Android, iOS and Linux, and there’s Chrome OS. So the SyncFileSystem API in Chrome is far less locked-in than iCloud. Though it’s not ideal, since it’s not implemented by Safari or Firefox. Hopefully it will eventually be standardized in HTML.

No comments:

Post a Comment