CNN animation of how deepfakes are made
CNN animation of how deepfakes are made
Now playing
01:17
This is how deepfakes differ from video manipulation we've seen before
PHOTO: CAMIO
Now playing
03:15
This AI technology tracks employees to enforce social distancing
Now playing
04:41
How facial recognition went from bad TV to Big Brother
CNN Business
CNN Business' reporter Donie O'Sullivan ran his photo through Clearview AI's software during a demo at CNN's studio.
PHOTO: John General/Richa Naik/CNN
Now playing
02:33
Is this facial recognition app going too far? We tested it
PHOTO: Neon
Now playing
01:58
These 'artificial humans' could be our distant future
PHOTO: John General/CNN
Now playing
01:53
This shopping cart knows what you're buying
Kincade Fire in Northern California, as shot by the Sentinel-2 satellite on October 27.
Kincade Fire in Northern California, as shot by the Sentinel-2 satellite on October 27.
PHOTO: European Space Agency via Descartes Labs
Now playing
03:09
Spotting wildfires is hard. AI could change that
PHOTO: University of Birmingham
Now playing
01:00
This driverless ship could cross the Atlantic alone
NEW YORK, NY - SEPTEMBER 20:  Bill & Melinda Gates Foundation co-founder Melinda Gates speaks speaks at Goalkeepers 2017, at Jazz at Lincoln Center on September 20, 2017 in New York City.  Goalkeepers is organized by the Bill & Melinda Gates Foundation to highlight progress against global poverty and disease, showcase solutions to help advance the Sustainable Development Goals (or Global Goals) and foster bold leadership to help accelerate the path to a more prosperous, healthy and just future.  (Photo by Jamie McCarthy/Getty Images for Bill & Melinda Gates Foundation)
NEW YORK, NY - SEPTEMBER 20: Bill & Melinda Gates Foundation co-founder Melinda Gates speaks speaks at Goalkeepers 2017, at Jazz at Lincoln Center on September 20, 2017 in New York City. Goalkeepers is organized by the Bill & Melinda Gates Foundation to highlight progress against global poverty and disease, showcase solutions to help advance the Sustainable Development Goals (or Global Goals) and foster bold leadership to help accelerate the path to a more prosperous, healthy and just future. (Photo by Jamie McCarthy/Getty Images for Bill & Melinda Gates Foundation)
PHOTO: Jamie McCarthy/Getty Images North America/Getty Images for Bill & Melinda
Now playing
04:20
Melinda Gates: We need more diversity in AI
PHOTO: Getty Images/Westend61
Now playing
01:11
How AI is changing the way we work
cnn satya nadella microsoft telefonica lon orig Biz_00003407.jpg
cnn satya nadella microsoft telefonica lon orig Biz_00003407.jpg
Now playing
01:56
How Microsoft is bringing companies into the future with AI
PHOTO: University of Colorado Denver
Now playing
04:20
When seeing is no longer believing: Inside the Pentagon's race against deepfake videos
SEATTLE, WA - JANUARY 22: A shopper scans the Amazon Go app upon entetering the Amazon Go store, on January 22, 2018 in Seattle, Washington. After more than a year in beta Amazon opened the cashier-less store to the public. (Photo by Stephen Brashear/Getty Images)
SEATTLE, WA - JANUARY 22: A shopper scans the Amazon Go app upon entetering the Amazon Go store, on January 22, 2018 in Seattle, Washington. After more than a year in beta Amazon opened the cashier-less store to the public. (Photo by Stephen Brashear/Getty Images)
PHOTO: Stephen Brashear/Getty Images North America/Getty Images
Now playing
04:27
Amazon is using AI in almost everything it does
Now playing
03:29
Microsoft president: World needs to keep pace with AI
PHOTO: CNNMoney
Now playing
01:21
Apple CEO: 'I do not fear machines'
PHOTO: Magic Leap
Now playing
01:16
Meet Magic Leap's almost-human AI assistant
(CNN Business) —  

It could be any ad on YouTube: A blonde model playfully puts her hand in front of the camera lens, dons white sunglasses and flashes a grin. In the background, hip-hop music plays while an unmistakably female voice says, “Fashion changes, but style lasts forever.”

The ad — part of a demo reel on YouTube created by a new startup called WellSaid Labs — is short and slick. But something is a bit different. While the model you see is a human, the background voice you hear only sounds like one.

The Seattle-based company is using voice actors and artificial intelligence to create synthetic voices that sound a heck of a lot like people. The company claims the text-to-speech software it has been working on for the past year can produce audio that sounds more human-like than other synthetic voices. The reason, according to the company, is that it is not tightly controlling different variables of speech like speed, pronunciation, and volume when training its voice model.

“The voice we’re trying to create here is super expressive and lifelike in its final result,” WellSaid Labs CEO Matt Hocking told CNN Business.

Computerized voices seem to be everywhere these days, offering news from a smart speaker in your living room or giving you turn-by-turn directions in the car. Yet Alexa, Siri, Google Assistant and others that you’re likely to hear from still tend to speak in stilted, robot-tinged voices. (A notable exception, Google Duplex, can call some businesses to make reservations with an impressively human-sounding AI-enabled voice; Google is making it increasingly available, but you’d have to be on the receiving end of a phone call — at a restaurant, for instance — to hear it).

WellSaid Labs isn’t planning to take over the voice-assistant market, though. Rather, Hocking said, it hopes to sell the voices to companies that want to use them in advertising, marketing and e-learning courses.

The company says it’s building a number of human-like voices that customers will be able to use, and hopes to work with voice actors to create a different data sets that can be used to create all kinds of artificial voices.

You’ve probably heard of stock photos; you might think of this as stock voices.

To make the woman’s voice in the faux ads, WellSaid Labs first had a voice actor read articles from Wikipedia. These recordings formed a data set that it used to train an artificial neural network — a computing system whose structure is modeled loosely after neurons in a brain.

Another online demo shows how similar the AI-generated voices can sound to the actors, with audio alternating between two almost indistinguishable voices — one the human voice-over actor, one her AI-generated voice — that sound like a middle-aged woman. You might occasionally notice some differences, but they’re slight; the emphasis you’d expect might be off by just a bit in a word, for instance.

The startup said it doesn’t need to pre-process or annotate text given to the software for it to be able to do things like emphasize words in a natural-sounding way — something that is difficult for an artificial voice to do without help (though companies such as Google have been working on it). And if you fed the same text to its text-to-speech generator twice, you’d get different results.

It takes about four seconds to render a line of text right now, said chief technology officer Michael Petrochuk. The model isn’t built to interpret long pieces of text, though: it can be used to speak several sentences, but the text of an entire CNN Business article, for example, would need to be cut into pieces before it could be analyzed and spoken by a WellSaid Labs voice. (The company made one of its voices speak the headline and first paragraph of this story — take a listen and see what you think.)

It’s hard to make a synthetic voice sound consistently good. Alan Black, a professor of language technologies at Carnegie Mellon University, said that the ones we’re familiar with, such as Amazon’s Alexa, are robotic sounding because it’s tricky to make it sound natural in all situations. It’s difficult, he said, to give the right amount of information to a speech synthesizer so it can respond with the right amount of feeling.

“We don’t have a little knob on our synthesizer to say ‘Do feeling 87%,’” he said.

He listened to some of WellSaid Labs’ demo voices, and thought they sounded “pretty good.”

But if artificial voices sound close to — or indistinguishable from — humans, should listeners be clued in that they’re not listening to a real person talk? After Google demonstrated Duplex in 2018 with a call that its human-sounding AI made to a Bay Area restaurant, the tech company was criticized for not having the AI disclose what it was.

Black doesn’t think that disclosure is necessary, at least in the context of ads.

“I think that in general most people are relatively aware that what they see in video and audio is in some sense processed,” he said. “They know that when they’re watching ‘The Lord of The Rings’ there really aren’t a lot of orcs in New Zealand appearing in the movie.”