Mozilla Ubiquity
Aza Raskin & co. at Mozilla have released a preview interface for their new experiment, called Ubiquity. The goal is to come up with a quicksilver-like interface for the browser to aid in completing tasks. This is different from how web interaction works now; today, it is information-based. A user finds disparate bits of information, and combines them manually. With Ubiquity, the goal is to specify the task, and let the software figure out the information needed to complete the task. But, can they do it?
To quote the announcement page:
The overall goals of Ubiquity are to explore how best to:
- Empower users to control the web browser with language-based instructions. (With search, users type what they want to find. With Ubiquity, they type what they want to do.)
- Enable on-demand, user-generated mashups with existing open Web APIs. (In other words, allowing everyone–not just Web developers–to remix the Web so it fits their needs, no matter what page they are on, or what they are doing.)
- Use Trust networks and social constructs to balance security with ease of extensibility.
- Extend the browser functionality easily.
The idea is interesting, but I see a few immediate and show-stopping roadblocks. To start, it all hinges on their ability to capture and interpret the intricacies of English (and other languages they hope to target). Natural Language Processing is one of, if not the most, difficult task in computer science. We have had the goal to make computers understand us for some time, and our best attempts have helped us make systems that can mimic conversation at best. Systems like Alice (http://alice.pandorabots.com/) do a good job of responding to conversation, but not as well understanding the intent of conversation. They have to figure this out to progress. From my research experience (http://www.firstmonday.org/issues/issue12_9/argamon/index.html), I know that some patterns can be found from text. But to actually 'interpret' it is a different matter.
Of course, if they restrict the language that can be used, it makes the job easier. But, that does not work out in the long-run. The problem is that English is a moving target; there is no sound way to pick the proper, all-encompassing wordlist that should be used. And, that becomes no different than a particularly verbose programming language (think AppleScript).
What might be more feasible is a knowledge queue, built from the current context, and used to complete such tasks by filling the bits needed to build this queue. Short example includes queuing up a flight request queue by showing a template where the flight information would go, then date and times, and those to contact, with restrictions built up later. The system could guess the context from the page contents (and any available APIs being used, and do the duty of loading the context information into the proper places. More on this later. In the mean time, I will have to give this a proper whirl, and see how far along they have come.