Put away your keyboard: It’s time to talk to our computers
For decades, every major computer has employed what is known as a graphical user interface: a system of icons, pointers and buttons that lets us easily navigate the machines on our desks and in our pockets.
The GUI (pronounced “gooey”) is now so ubiquitous that we take it for granted: point a mouse or a finger at something and click or tap. Even a novice can pick it up in minutes.
Before Xerox introduced it in 1973 (subsequently adopted by Apple and Microsoft), people would interact with computers with a “command-line interface”, a system of instructions and responses that required users to type out carefully-typed strings of text to get a computer to do something. These weren’t pleasant or easy to use, so were it not for the GUI, the incredible growth of personal computing may never have happened.
But while the last 40 years have been good to it, a replacement for the graphical user interface could be just around the corner. In the same way that the mouse and colour screens replaced keyboard-controlled commands, something most of us own and use every day could come to rule computing: our voices.
A computer that we can talk to has been the stuff of science-fiction for decades, and something of a novelty in our gadgets more recently – we can ask our phones for the time or to play music – but the stars now appear to be aligning in a way that could, potentially, make voice the next computing paradigm.
Why it could be voice's time
A crucial advance has simply been in the ability of computers to understand our voices. With a few exceptions (thick accents being one) software is now very capable of interpreting our speech as coherent sentences, rather than jumbles of unconnected words. So when I ask a computer “Will it rain on Thursday?”, it now almost never hears “Wheel it running thirsty”.
But the real difficult challenge has been building software that can actually respond to sentences in a useful way.
While pointing and clicking at menus is not open to interpretation, computers have historically been very poor at understanding the nuances of language, for all their their number-crunching prowess.
But the advent of machine learning has turned this on its head: in a short space of time, machines have become a lot better at grasping what sentences mean, and what the best way to respond to a command or question is.
These two developments – the ability to turn sounds into text and then to understand it – mean we are now able to hold conversations with computers; and the world’s tech giants are now taking a bet that people will increasingly want to do so.
Software that can intelligently respond to sentences and hold conversations – also known as bots – are about to leap into the mainstream, if you believe Microsoft – which debuted a series of tools for building bots last week – and Facebook – which is expected to make them the centre of its annual developer conference next week.
Bots that can do proper work are now a real prospect. Whereas five years ago the best way to find a piece of information was to open a web browser and turn to Google, it is now often quicker and easier to pose a question to your virtual assistant.
Spoken queries to Google doubled last year, and are growing faster than typed ones. Siri, Google Now and Cortana – the voice-activated assistants built respectively by Apple, Google and Microsoft – have rapidly become far more sophisticated than when they were first introduced.
And Amazon’s Echo, a black cylindrical speaker without a screen or mouse, has become one of the online retailer’s top sellers. Activated by saying “Alexa” (the name of the Echo’s assistant) and a command (such as “Add milk to my shopping list” or “Play some country music”), the Echo was ridiculed when released in 2014; but has become a fixture in many American homes.
Getting over the awkward factor
To really offer an alternative to how we currently use our devices, though, voice has to get over the awkward factor.
Addressing one’s smartphone in public can still be embarrassing, especially in the loud and clear manner one has to use to ensure a sentence is heard in a busy, outdoors environment. And considering how accustomed we are to talking on our phones, dictating a message or command when strangers are in earshot can feel uncomfortably self-exposing, no matter how innocent its content.
This unease is only heightened by the possibility that the voice recognition software may misinterpret one’s words.
But norms change. In many cases, voice is much quicker than typing. Its appeal is especially powerful in parts of the world where literacy rates are lower, in languages that don’t use the Roman alphabet around which keyboards were designed, and for those who struggle to use a mouse or touchscreen.
And as other household items, often too small or cheap to warrant a touchscreen, become connected, voice could prove a reliable alternative. The stamp-sized Apple Watch, for example, makes far more productive use of Apple’s Siri assistant than the iPhone.
Voice also offers a host of possibilities for better hands-free computing: Amazon’s Echo has taken off as a device used when people are busy cooking, and car makers are rushing to fit their vehicles with voice-recognition technology that lets drivers keep both hands on the wheel.
Voice’s limitations mean it might not become the dominant computing platform soon, but if you want to talk, it will be listening.