Even with years of practice, it can be tricky for humans to understand the back-and-forth and nuances of conversations — we interrupt each other, misunderstand things, and sometimes have to repeat ourselves to get a message across. Now imagine how difficult it may be for artificial intelligence, which at its best is far less capable than any of us, to figure all that out and weigh in at the right time, too.
That’s what Amazon (AMZN) is trying to do with a new feature for its Alexa virtual assistant that it introduced on Thursday, alongside a bevy of products and features during an invitation-only online hardware event. Slated for release next year, it will allow an Alexa user to say “Alexa, join the conversation,” and the assistant will be able to weigh in throughout a multi-person discussion. It’s very different from how we typically interact with Alexa via individual voice applications, which Amazon (AMZN) calls “skills,” on smart speakers and other devices that use the virtual helper.
Amazon demonstrated how it will look in a video shared Thursday during its event, as two women coordinated a pizza order with Alexa, which was embodied in an Amazon Echo Show gadget on a kitchen island. Both women interacted with the assistant, at times talking over it, and at others addressing it directly, such as by saying, “That one!” when Alexa got to the preferred pizza topping combination, and when asking for a movie recommendation after deciding on a pizza. Though they asked questions that could have been addressed to each other — such as, “Do you think a medium is going to be enough?” and “Is it a good movie?” — Alexa appeared to properly sort out which questions were meant for the humans versus those intended for the AI system.
This kind of human-like interaction between Alexa and actual humans isn’t easy to perfect, and Amazon has been working on it for a while. In July, Amazon showed off an early version of Alexa Conversations, which is meant to help Alexa skills developers create more human-like dialogue. This came more than a year after Amazon introduced the idea of combining several requests — movie tickets, a restaurant reservation, and a ride — into a single conversation between Alexa and one human (this particular capability launched in January).
The sort of turn-taking Amazon demonstrated Thursday, which can be used for conversations between multiple people, marks the next step toward Alexa becoming a capable conversation partner.
Rohit Prasad, vice president and head scientist for Alexa AI, told CNN Business that enabling Alexa to participate in natural-sounding turn-taking requires a number of steps and signals. First, Alexa has to detect speech; then, it has to figure out what it is that a person is saying and whether that utterance was actually directed at Alexa — something that’s more difficult than a typical Alexa interaction, because in this case you are only addressing Alexa directly when you invite it into the conversation. Alexa also must keep track of the history of a conversation in order to know what to suggest.
If Alexa is running on a device with a camera, like one of Amazon’s Echo Show gadgets, and Alexa has permission to access that camera, it will harness that as an additional cue: The camera can be used to estimate the pose of a person near it, in order to determine whether that person is speaking to Alexa or someone else in the room, Prasad said.
In the future, Prasad said, Alexa will be able to interrupt you, too. This might be helpful if, say, a conversation about which pizza to get or movie to watch turns into an argument.
“But it has to be really right in interrupting, otherwise you’ll get annoyed at her,” he said.