25 Dec 2013

Superfast synchronization for apps

Simplenote, the once-great [1] notes app, syncs quickly. When you open the iOS [2] app, there’s a delay of a second or two, and then all the notes are instantly updated. Once the first note is updated, ALL the notes are updated as fast as you can see, and certainly faster than you can move your finger and tap a note. Unlike, say, Dropbox, which downloads updated files one by one.


I wondered how they might have implemented it to be so quick, which then set me thinking as to whether it’s possible to implement a sync protocol with just one HTTP round-trip, which is as fast as you can get. Here’s how such a protocol may work:


Imagine that when you edit a note, the edits are saved. Not the final state of the note, but the edits. For example, if you have a shopping list:


Bananas
Milk


and you insert a new item, Curd, between the two, the server stores not the final state of the note, consisting of:


Bananas
Curd
Milk


but also the edit itself (“Insert ‘Curd’ at the beginning of line 2”).


Now, if you store all the edits to the note from the beginning of time, when that note starts as an empty note, you can play back the sequence of edits to the end to generate the final state of the note. Or play back the edits from the beginning till any point in time, and end up a snapshot [3] [4] of the note at that point.


Let’s say the app on a particular device was synced to revision 10, while the state of the note on the server advanced to revision 15. Now, when the user opens the app, and it has to be brought up to date. Let’s assume for simplicity [5], that there weren’t any offline edits made to the stale data that need to be reconciled. In this case, the app can make a single HTTP request to the server saying, “I have revision 10. Please tell me what happened since then.” The server replies saying, “Oh, the current state is revision 15, which means five edits have happened since the last sync. They are: ‘insert X at position 45, delete positions 78-90, etc’.” Now the client app applies these edits, resulting in an updated note. This takes just one HTTP round-trip.


A notes app has more than one note, of course. But you can still view the notes list as one logical entity, consisting of notes with unique IDs, where the edits include an ID, such as “Insert the text ‘Curd’ at index 45 in the note with ID 578457485748957”. So, you’d store a sequence of edits to the entire notes list, rather than to one note. This may contain an edit to note A, followed by a edit to note B, followed again by an edit to note A, all in one list. The same replay logic above works to sync all the notes, rather than one, again with just one HTTP round-trip.


In fact, I noticed Simplenote replaying edits across notes in the same order in which I’d made the edits. When I edited note A, then edited note B, and then again note A, and later opened the app on another iDevice, I saw the edit to note A, then the one to note B, and finally the second edit to note A. Maybe the sync protocol above is what Simplenote really uses. In any case, what’s important is that you can use this technique in your app, for super fast synchronization.



[1] But now buggy and unreliable.


[2] The Android app stays in sync almost all the time, thanks to Android’s multitasking.


[3] You can also go in the reverse direction, storing the snapshots rather than the edits, and diffing the snapshots to generate the edits. But this is probably not a good idea because, as a general principle, you don’t want to needlessly convert data from one form to another, since the conversions could be lossy. What the app is getting from the user is edits, so it’s better to store them verbatim, rather than applying them and storing the result, and then trying to reverse-engineer the edits.


As a concrete example, if you have the note:


Bananas
Curd


and the user made an edit, resulting in:


Bananas
Curd
Curd


you can’t reverse-engineer the edits: did the user insert text before or after the second line in the original note? If you think it doesn’t matter, imagine that the Curd that already existed in the note was edited simultaneously on a different device, say to Slim Curd. Now, when you reconcile the edits, you need to know where the user inserted Curd, because it means you could end up with:


Bananas
Curd
Slim Curd


or with:


Bananas
Slim Curd
Curd


depending on what the original edit was. You don’t want to throw away that information, because there’s no way to recover it. So, store the edits.


[4] Storing the edits doesn’t mean that you have to play back an unbounded number of edits, which could number in the thousands, to generate the final state of the note from scratch. You can generate snapshots periodically and cache them, so that you have to replay at most 10 edits, say, to generate any desired snapshot. But the edits are still the authoritative state of the document. You can also merge adjacent edits as long as no client is synced to a point in between them (the server can keep track of how many clients exist, and where each is synced). You can also throw away ancient history, keeping only the last 100 edits, say, merging everything before it into one edit.


[5] Resolving conflicting edits shouldn’t be hard, with operational transform: http://en.wikipedia.org/wiki/Operational_transform Maybe that can still be done with one HTTP round-trip, or maybe not. I’d be happy with a notes app that updated all notes that don’t have a conflicting edit, with one HTTP round-trip.


[6] 

No comments:

Post a Comment