this is stevenf.com

WARNING: A long, rambly exploration of the state of computing with no real conclusion follows.

Every geek I know shares, to some degree, the notion that the “desktop” metaphor for computers is outdated. What nobody seems to have a solid opinion on is what would take its place.

There have really only been two dominant UI metaphors in the short history of desktop computing:

  1. Keyboard + command line

  2. Mouse + desktop

A third metaphor, the pen, never really gained much traction.

On one hand, you had Tablet PC, which was really just metaphor #2 but with a pen in place of the mouse.

Then you had Newton OS and the original Palm OS, which were really the only two platforms that took into consideration the actual way humans might hold and operate a pen. But they approached it in different ways.

Palm OS focused on getting you from here to there in a minimum of pen taps. Newton OS was more gestural, a classic example being the process of creating a new note. Rather than clicking a button labeled “New Note”, you drew a horizontal line bisecting the screen, and that line transformed into the separator between the old note and the new.

There’s room for some argument as to which way was better. In terms of discoverability, the Palm’s “New Note” button is much more immediately obvious to the user. The Newton’s approach, however, takes up literally no screen space at all on an already small screen, leaving more room to view your data. It also happens to feel absolutely magical once you get used to it.

The main reason pen computing never took hold, I suspect, is the awkwardness of text entry. Handwriting recognition (although it improved quickly over the years) is still error-prone. And virtual software keyboards are tedious.

History then brings us to a fourth metaphor, direct interaction via multitouch, introduced to most people by the iPhone. It’s possibly the biggest new UI approach to hit the mass-market in recent memory.

A single touch is basically a mouse, minus the abstraction. Rather than operating a physical device that moves a virtual finger around, you just go in with your actual finger. Lost along with the mouse is precision. Apple estimates that a fingertip touch has an area of about 40x40 pixels on the iPhone’s screen, versus 1x1 for a mouse pointer, and (I’d estimate) maybe 8x8 for a jittery pen tap. So you’d better have built-to-purpose software that anticipates this.

When you move up from single to multi-touch, you re-enter the world of gestures. They’re still very hard to discover unless you’re taught them, but once you know them, they’re extremely gratifying to execute.

It takes two seconds to learn pinch-to-zoom, but if you handed an iPhone to someone who had never seen one and said “zoom in on this web page”, they’d have no clue how to do it. They would likely not even know it was possible to zoom unless you had told them.

However, once you showed them, it would immediately seem natural, and it’s hard to conceive of a more efficient way to perform zooming with human hands. Like the Newton’s “new note” separator, it provides functionality with no screen space required for controls, and provides a tactility that is extremely gratifying at what must be a very low level of the brain.

The benefit of pinch-to-zoom over previous zooming methods is so immediately apparent that it justifies the learning curve. That the learning curve is extremely small also helps. I find it fascinating that a huge portion of iPhone usability training is done via the TV ads, pre-sale. They’re both marketing and instruction.

So, if you’re going to manage to convince the world to change the way it does something, it had better have an benefit that’s immediately obvious and take no more than a few seconds to explain how to do.

Most touch devices so far are phones, and on our phones we’re still saddled with either a virtual software keyboard, or a tiny, cramped hardware keyboard. Neither of which are a great experience. But text entry is so fundamental to everything we might want to do with a computer. Are we stuck with the keyboard (a holdover from the typewriter!) forever? Handwriting and voice recognition show no signs of replacing it any time soon. Short of mind-reading, we’re literally running out of physical points-of-contact for humans to get text into machines, and the keyboard is winning by a long shot.

Even if we assume that we magically solve the challenges of human interface overnight, there’s still the issue of the architecture of data itself.

All major operating systems today still use basically the same hierarchical file system that we’ve used pretty much forever. Is it a natural fit for the human mind? Not especially. But it’s “artificially natural” by virtue of millions of people being very familiar with it. If you have an unnatural way of doing something that just happens to be readily understood by an entire generation, then the innovative new “natural” way of doing things you’ve invented might in fact be unnatural by sheer virtue of the fact that it is different.

Every once in a while, there is an attempt to obsolete the concept of the hierarchical file system. The Newton had a very unique object storage system — essentially a system-wide “soup” of data objects, such as your calendar events, address book contacts, and so on. Any app could dip into this database and pull out objects, even those put there by other applications. Applications could then look at them, maybe even modify or extend them, without needing the original application to intervene at all.

The result was an unprecedented interconnectedness of data among Newton applications, both first and third party, unmatched by any desktop environment that I know of. An entirely different universe from the iPhone, which basically has the Berlin Wall between every application. The iPhone lacks even API to query the user’s calendar — even in a read-only way.

The Newton’s object store was an engineering marvel that fell apart as soon as you needed to exchange data with the outside world. You couldn’t just take a text file and send it over to your Newton because the Newton didn’t understand the concept of “file”. Your text first had to go through a conversion (via Newton Connection Utility) into an object format that some Newton application (in this case, probably Notes) could handle. Then you’d have the reverse problem going the other way.

Because desktop apps and Newton apps would never offer exactly the same feature set, inevitably these conversions result in loss of some information. Nitty-gritty things like precise formatting, metadata, and so on are the first things out the window when you need to convert data between two formats. It leaves you with a “lowest common denominator” form of information exchange that’s more frustrating than just being able to send files around. But in order to “just send files around”, you’d have to jettison all your radical (and useful) innovations and go back to square one: the good old hierarchical file system.

Similarly, consider the Classic Mac OS and its resource forks. There were genuine, tangible benefits to having resource forks around. But any time you had to exchange a file with a non-Mac environment, a great deal of pain ensued. In the long run, interoperability won out over innovation.

And this seems to me to be the barrier to moving forward to any sort of next generation of computer interface, whatever it might be. Numerous projects (Squeak comes to mind) have put forth ideas as to how we might interact with data in the future. As amazingly revolutionary and beneficial as your new idea may be, you can’t escape this albatross of legacy data. Unless your new metaphor presents a new way of working that is such an obvious and dramatic improvement over the status quo (like pinch-to-zoom) then there’s no compelling desire to laboriously learn, adapt, and migrate to the new environment.

It seems a bit fatalistic, but I can’t think of a way that the entire desktop metaphor can be overhauled without either everyone in the world switching over at once (which won’t happen), or becoming a “data island” like the Newton or Classic Mac OS.

I’d love to be proven wrong, but the only realistic way up seems to be to build atop the existing infrastructure, with all of its cruft, and keep abstracting it away over time, fingers crossed all the while that Moore’s Law has a few good decades left in it. Abstractions are rough on performance, but ease the pain of migration for a species (humans) that generally abhors change.

It’s conceivable that a seemingly innocent OS feature like Spotlight could go this direction. With Spotlight, you still have traditional files and folders underneath it all, but you can also get to the same data by asking for a specific email address, keyword, or date. You do not, strictly speaking, need to use the file system or the originating application to get there. One day, years from now, if and when Spotlight is advanced to the point of being a superior interface to your data, could we then just turn off the files and folders interface entirely?

Another Mac OS X feature, QuickLook, although it’s off to a slow start, abstracts away the viewing of documents, by having each application provide a standalone means by which to display its data. If there is one day a corresponding “QuickEdit”, will we have fully circled back to the original premise of component-based software, like OpenDoc?

Or is our trajectory locked toward web apps? It’s hard to ignore the popularity of web apps, and they already fulfill the promise of “running everywhere” that we’ve still not achieved in an elegant way with desktop software. Is the logical conclusion that operating systems eventually become just extremely capable web browsers? At this moment in history, it actually seems plausible, if maybe not a foregone conclusion. But the last time we talked about network computing, we called it “thin client” and it didn’t exactly pan out. We’ve stretched HTTP in amazing new directions, but even the most AJAX-y rich web app still lacks a certain snappiness and coherence that can be found in desktop software. Does greater bandwidth and a more refined transport protocol solve the problem?

I wish I knew the answer. Because then I’d probably be rich. I don’t have any grand schemes, but I find the discussion to be very fascinating. Changing the world, it turns out, is fairly difficult. And the world always wants to veer in the opposite direction from the way you are turning the wheel.

Hi, I'm Steven Frank

I'm Also Over Here

Some Side Projects

Search