by Robert Wolverton –
“It’s voice activated, but you know, everything is voice activated these days.”
Science fiction has been known to be fairly prophetic — Captain Kirk’s nationally televised questions to ‘computer’ on Star Trek is a good example — but I didn’t expect a children’s animated movie to define the future. As it turns out, Holley Shiftwell was speaking the future into existence in the quote above from Disney Pixar’s Cars 2 in 2011. It’s amazing how quickly fantasy can evolve into everyday reality.
Let’s take a look at how fiction has morphed into fact with voice-activated technologies, where we stand today, and where we’re likely headed. Most importantly, let’s consider how voice-activation technology could impact your business.
In the early 1960s, IBM released the world’s first voice recognition system. It was called Shoebox1, and it was capable of recognizing sixteen spoken words and ten digits, 0 through 9. Not very impressive by today’s standards, but certainly a dramatic breakthrough for its time.
In the 1970s, non-fictional humans began to have routine voice-activated interactions with computers. This is when the first Voice Response Units (VRU) appeared. VRUs are telephone routing systems that are “navigation tree” oriented, and capable of recognizing specific commands. Early VRUs supported limited, one-way interactions: the human would voice a command, and the computer would execute a pre-defined operation.
Though VRUs were a revolutionary technology, they supported only limited human-computer voice-activated interactions. Around 2000, the technology took a significant step forward with applications that enabled conversation-like interactions such as TellMe.
Although TellMe and other competing products advanced VRU, the technology was still fairly limited. In general, the technology was relegated to simply routing calls to the proper human who would then complete the requested task.
The old Star Trek series was quite on-target with its concept of the future of human-computer interactions. But it didn’t get everything right. The future, it turned out, was only a few decades away, not a few centuries. And we’re not speaking to an inanimate ‘computer’ – rather, we’re conversing with a very human-like persona named Alexa or Siri.
Services like Alexa and Siri are powered by a seemingly magical technology called Natural Language Understanding (NLU). This revolutionary technology provides the ability for computers to understand various phrases, discern the intent of the phrase and respond in kind with a human-sounding voice.
This technology has progressed far beyond the ability to simply answer questions. As Grant Clauser of Wirecutter.com recently noted2, “Using nothing but the sound of your voice, you can play music, search the Web, create to-do and shopping lists, shop online, get instant weather reports, and control popular smart-home products.”
As illustrated in the graphic below, the skillset range of NLU-powered devices is similar to the smartphone app concept: new capabilities (called ‘skills’) can be plugged into the system as they are developed.
The open SDK architecture of these devices enables a rapid expansion of capabilities. Alexa already boasts a set of more than 15,000 skills! Additionally, some NLU-powered devices, like Amazon Echo Show, provide an audio-visual interface, enabling more complicated, more varied interactions.
NLU technology has enabled an impressive range of voice-controlled capabilities, including everyday or frequent tasks such as:
One universal limitation with NLU devices has been the restriction of operating in only one language. But that limitation is rapidly changing; multi-language capability is on the rise. (After all, Star Trek’s computer could surely handle a command in Swahili, or Klingon, or any other known language.)
Google essentially won the war of the search engines and is now the default entry point to the “printed” web. But that victory may turn out to be short-lived. In effect, voice-activation technology has leveled the playing field once again. There’s a new battle for home turf. And the winner of this battle will control the destinations of consumers by becoming the predominant point of entry. The rewards will be substantial; companies will pay lucrative “slotting fees” to be placed in the top two or three choices triggered by a specific question or phrase. The list of choices will be limited as audio does not lend itself well to long lists of options since people become overwhelmed or tend to tune out rather quickly.
The point of entry could easily influence consumer buying patterns. Let’s say that you want to bake chocolate chip cookies this weekend and you “mention it” to your new NLU-enabled assistant who checks your phone for ingredients on your chocolate chip cookie recipe, interacts with your connected refrigerator to check its inventory, and automatically orders the extra butter and eggs. (How smart is Amazon’s acquisition of Whole Foods looking now?)
This is just the tip of the iceberg. Companies will be jostling to pay exorbitant slotting fees to outrank competitors for their slice of the new pie.
Though only relatively simple transactions are currently supported by this technology, the complexity of supported transactions will rapidly increase. Voice interaction will not be enough; audio-only capabilities impose substantial limitations in the breadth and depth of what can be communicated.
We’ll continue to move away from text-driven interfaces to more voice-activated interfaces. But a visual component will play an increasing role in that interaction, likely relying on icons and emoji-style images to make interactions even faster and easier. The interface will become literal child’s play. Imagine your three-year-old ordering his own Hot Wheels playset — that’s where we’re headed (for better or worse!).
Machine learning and artificial intelligence (sitting atop big data environments) will greatly expand the capabilities of these devices over the next few years. Notice I did not say five-ten years — the technology horizon continues to move ever closer to the present.
The predominant services (currently Alexa, Google Assistant, Siri, and Cortana) will serve as the entry points and will earn commissions on everything you purchase. The big winners among these competing services will be those that can:
At the present, it’s true that most simple transactions are handled through these interfaces. But don’t make the mistake of assuming that this will long remain the case. Doing so will foster a sense of complacency that will place you at risk of falling far behind your competition. As one CIO recently told me, “with the speed of adoption we’re seeing for these devices, they may become the expected norm by the time we reach 2019”.
“Voice is the future of customer and enterprise engagement. It will become a brand differentiator over the next 3 years as NLU’s become more sophisticated and machine learning becomes truly dynamic”, says Bob Lautenbach, CTO of Voceio, a company at the forefront on the voice-first revolution that provides voice solutions for the hospitality, cruise, and property management sectors.
To avoid the “last one to the dance” calamity, it is important to think through the potential impacts on your business and make a plan to effectively compete in this new age. Do some brainstorming – consider all the products and/or services you offer and the related customer interactions/channels that are involved. Think about how voice activation and NLU technology will impact:
And remember that as other companies demonstrate the conveniences these technologies offer, your customers will expect you to offer the same. Do so, and they’ll be happy to remain your customers. Failure to keep pace may leave you wondering where your customers went. Make a plan and begin executing because “you know, everything is voice activated these days.”
1. IBM Shoebox https://www.ibm.com/ibm/history/exhibits/specialprod1/specialprod1_7.html
2. New York Times (2023, May 17) Retrieved from www.nytimes.com/wirecutter/reviews/best-alexa-speakers