More on device updates: it’s harder than it looks (11/6/2013)

(Originally posted on November 6, 2013)

I writ large rambling the other day regarding my own hands-on experience with cascading failures due to, among other things, the perils of pushing large, complex, irreversible, and required updates to IoT appliances – such as “Connected TVs” and the like.

In order to try to move the ball forward here (and not just complain!), here’s my contribution… some ideas on rules/policies that device designers should follow when it comes to “update architecture”:

Reversible: Updates should be reversible, in case of external compatibility issues (the update introduces changes in its external interface that breaks its integration with other systems) or internal issues (the update fails to install completely, or otherwise renders the device unusable)
Predictable: Users / controlling systems should be in control as to when an update cycle (and any ensuing device reboot) is initiated.
Non-intrusive: Updates should download in the background and if possible install in the background
Preserves durable settings / state: Updates should respect existing settings and configurations where possible

Obviously, not every “thing” in the “Internet of Things” will have enough memory or CPU power available so that its software engineer will be able to reliably follow the rules above… for some “things”, there may not be any “background” in the process model, or enough memory to store a complete operating system / firmware image during a background download, or even a UI with which to offer the user a choice in the matter.

But you get the general point: keep the user or controlling system in control of the update process, and make it reversible in case there are problems.

The natural presumption is that, when it comes to updates, a “thing” doesn’t exert an outsized negative effect on the operation of the hierarchy / network that it’s a part of. You wouldn’t expect a low-level leaf “thing” in a constellation of “things” to require a reset of the entire system just to take an update. The amount of risk that it injects into the system should not be larger than its role in the system.

Conversely, the more intelligent “things” out there – “Connected TVs”, for instance – which are sold like appliances but in reality are complex, stateful, devices – represent large challenges for “update architecture”. These devices are complex enough that updates can take many minutes (40 in my Apple TV example) and induce significant risk. But, to the user, these “things” are “appliances”, so the average person’s expectations do not match the complexity of the device. I just want to watch a movie – tell me why I need to wait 40 minutes for an update that I didn’t ask for, can’t postpone, don’t understand the value proposition for, and am, in fact, leery of?

Getting the “update architecture” right is challenging. Look at Microsoft’s recent stumble here (where Microsoft had to withdraw its major Windows 8.1 update). I’d argue that it’s taken years for Microsoft to get their approach to updates right, and that they generally do a pretty good job at it, which makes this failure stand out even more.

PS: As I write this, I’m thinking that there might be other sets of rules to think about here… such as how to archive the state/settings of a system of “things”, and how to apply such an archive to a system (to restore it to a previous known state)? A post for another time.

Share this:

Like this:

Related

Leave a Reply Cancel reply