San Francisco CNN Business —  

Big tech companies don’t like to talk about it. And when users find out it’s happening, they’re often surprised — and disturbed.

Yes, if you talk to a virtual assistant, such as Amazon’s Alexa, a human may listen to a recording of your chatter.

Recent reports have highlighted what is actually a longstanding practice meant largely to improve the artificial intelligence that underpins the virtual assistant-powered gadgets and services that are popping up throughout people’s homes and lives.

The practice raises privacy concerns for smart-speaker users in particular, who might have known that Amazon, Google, and Apple create recordings each time you speak to Alexa, Google Assistant, and Siri, respectively, but not that people might review them.

The companies have said only a small percentage of recordings are listened to by humans. Still, Google and Apple have temporarily halted human reviews of their recordings, while Amazon recently changed its settings to make it easier for people to avoid such review at all. Last week, Facebook said it, too, had paused human review of some users’ audio clips, such as those sent as audio messages via the social network’s Messenger app. Facebook had been using humans to listen in, as part of an AI-transcription feature.

An Amazon Echo Plus smart speaker photographed on a kitchen counter, taken on January 9, 2019. (Photo by Olly Curtis/Future via Getty Images)
Olly Curtis/Future/Getty Images
An Amazon Echo Plus smart speaker photographed on a kitchen counter, taken on January 9, 2019. (Photo by Olly Curtis/Future via Getty Images)

Lost in the shuffle of these revelations is whether people are truly needed to make these AI-dependent systems work, and how much companies should tell users about this process.

Numerous experts in AI, ranging from academics to startup entrepreneurs, told CNN Business that there is a legitimate need to listen to some snippets of conversation in order to make all kinds of voice-operated technology work -— no matter if it’s a smart speaker in your living room or a virtual server at a drive-through restaurant.

At the same time, they think tech companies should do much more to make it clear what happens to any recordings from these systems, and what risks there may be to your privacy.

“If you think about it, why would you want a stranger in your home, listening to your private conversations?” asked Mainul Mondal, founder and CEO of San Francisco-based startup Ellipsis Health, which uses AI to analyze conversations patients have with doctors and other healthcare providers.

AI is not magic

Virtual assistants are powered by machine learning algorithms, which comb through massive amounts of data, searching for patterns. To work, these assistants need to be trained on lots of data -— in this case, lots of conversations.

It takes about 20,000 hours of audio to train an assistant that can be rolled out to users, according to Jason Mars, CEO and cofounder of Clinc, an Ann Arbor, Michigan-based startup that builds conversational assistants for banks and other companies.

These assistants can be trained to do all kinds of tasks, like telling you what the weather is or playing a song on command. Still, people remain vital not just for providing the information to train these systems in the first place, but also for helping them improve over time, which is why someone somewhere might be listening to a recording of you asking Alexa to play “Truth Hurts” by Lizzo.

Justine Cassell, a professor of language technology at Carnegie Mellon University, said humans are “essential” for making AI-powered, voice-controlled products, since the technology is still bad at figuring out how people talk -— that is, how we choose the way to say what we say (which can vary depending on whether we’re talking to, for instance, a coworker or a family member).

AI is also not good at figuring out how to respond to us appropriately. For instance, if you ask Siri to tell you a joke, and you respond to the punchline by saying, “That’s not very funny,” it will respond with a robotic, “I’m not sure I understand.”

“Machines are not good at this, and that’s why people listen to human speech: because we’re really good at classifying that kind of stuff,” Cassell said. “I know whether, to some extent, you feel positive about what I’m saying or negative about what I’m saying.”

Companies need to do more work

Another issue is that people tend to personify the devices, said Florian Schaub, an assistant professor at the University of Michigan who has studied people’s privacy perceptions when it comes to smart speakers. As a result, they’re not really thinking about the fact that they’re sending a query to Amazon when they utter a command to Alexa. Realizing that a person somewhere might hear what you say feels a bit violating, he said, and could make users wonder what else these companies know about them. He often hears that, rather than using built-in privacy controls (such as a physical mute button that many smart speakers have to stop the device from capturing anything you say), people just unplug the device.

“I think these companies need to do more work to figure out what are really people’s concerns,” Schaub said.

Companies, he believes, aren’t properly communicating what they’re doing with user data in the first place, and they could do more to talk about privacy risks users face, as well as how they’re protecting users’ data.

Apple, Amazon, and Google declined to comment beyond issuing statements and, in the latter case, a blog post, that point out the companies’ commitments to user privacy.

Cassell and other experts suggested that companies that want to analyze users’ utterances first ensure user data is anonymized, which is a common practice among companies. She also thinks they should give employees monitoring such audio the same kind of ethics-focused training that academics get before they can conduct studies with humans, and let users know about it, too.

Humans have to be in the loop

Will technology you talk to ever become good enough as to make human review unnecessary?

It’s not likely, at least not in the foreseeable future.

If the goal of these companies is to eventually have voice assistants that we can have natural-sounding conversations with -— rather than the kind of stilted back-and-forth that typifies these interactions today -— we will always need some manual review, Schaub said.

Mars and other experts agreed.

“We believe we have a lot of hard work to do before you can talk to these systems like a human in a room,” Mars said.