Susan Bennett says she is the voice of the original U.S. version of Siri on Apple's iPhone
Apple won't comment, but other sources -- including an audio forensic expert -- confirm this
Recordings from 2005 were used for Siri; hearing herself six years later was a surprise
How CNN's Jessica Ravitz, who had never used Siri, found Bennett is also shocking
For the past two years, she’s been a pocket and purse accessory to millions of Americans. She’s starred alongside Samuel L. Jackson and Zooey Deschanel. She’s provided weather forecasts and restaurant tips, been mocked as useless and answered absurd questions about what she’s wearing.
She is Siri, Apple’s voice-activated virtual “assistant” introduced to the masses with the iPhone 4S on October 4, 2011.
Behind this groundbreaking technology there is a real woman. While the ever-secretive Apple has never identified her, all signs indicate that the original voice of Siri in the United States is a voiceover actor who laid down recordings for a client eight years ago. She had no idea she’d someday be speaking to more than 100 million people through a not-yet-invented phone.
Her name is Susan Bennett and she lives in suburban Atlanta.
Apple won’t confirm it. But Bennett says she is Siri. Professionals who know her voice, have worked with her and represent her legally say she is Siri. And an audio-forensics expert with 30 years of experience has studied both voices and says he is “100%” certain the two are the same.
Bennett, who won’t divulge her age, fell into voice work by accident in the 1970s. Today, she can be heard worldwide. She speaks up in commercials and on countless phone systems. She spells out directions from GPS devices and addresses travelers in Delta airport terminals.
Until now, it’s been a career that’s afforded her anonymity.
But a new Apple mobile operating system, iOS 7, with new Siri voices means that Bennett’s reign as the American Siri is slowly coming to an end. At the same time, tech-news site The Verge posted a video last month, “How Siri found its voice,” that led some viewers to believe that Allison Dufty, the featured voiceover talent, was Siri. A horrified Dufty scrambled in response, writing on her website that she is “absolutely, positively NOT the voice of Siri,” but not before some bloggers had bought into the hype.
And there sat Bennett, holding onto her secret, laughing and watching it all. For so long she’d been goaded by others, including her son and husband, to come forward. Her Siri counterparts in the UK and Australia had revealed their identities, after all.
So why not her? It was her question to wrestle with, and finally she found her answer.
“I really had to weigh the importance of it for me personally. I wasn’t sure that I wanted that notoriety, and I also wasn’t sure where I stood legally. And so, consequently, I was very conservative about it for a long time,” she said. “And then this Verge video came out … And it seemed like everyone was clamoring to find out who the real voice behind Siri is, and so I thought, well, you know, what the heck? This is the time.”
The Siri surprise
The story of how Bennett became this iconic voice began in 2005. ScanSoft, a software company, was looking for a voice for a new project. It reached out to GM Voices, a suburban Atlanta company that had established a niche recording voices for automated voice technologies. Bennett, a trusted talent who had done lots of work with GM Voices, was one of the options presented. ScanSoft liked what it heard, and in June 2005 Bennett signed a contract offering her voice for recordings that would be used in a database to construct speech.
For four hours a day, every day, in July 2005, Bennett holed up in her home recording booth. Hour after hour, she read nonsensical phrases and sentences so that the “ubergeeks” – as she affectionately calls them; they leave her awestruck – could work their magic by pulling out vowels, consonants, syllables and diphthongs, and playing with her pitch and speed.
These snippets were then synthesized in a process called concatenation that builds words, sentences, paragraphs. And that is how voices like hers find their way into GPS and telephone systems.
“There are some people that just can read hour upon hour upon hour, and it’s not a problem. For me, I get extremely bored … So I just take breaks. That’s one of the reasons why Siri might sometimes sound like she has a bit of an attitude,” Bennett said with a laugh. “Those sounds might have been recorded the last 15 minutes of those four hours.”
But Bennett never knew exactly how her voice would be used. She assumed it would be employed in company phone systems, but beyond that didn’t think much about it. She was paid by the hour – she won’t say how much – and moved on to the next gig.
The surprise came in October 2011 after Apple released its iPhone 4S, the first to feature Siri. Bennett didn’t have the phone herself, but people who knew her voice did.
“A colleague e-mailed me [about Siri] and said, ‘Hey, we’ve been playing around with this new Apple phone. Isn’t this you?’”
Bennett went to her computer, pulled up Apple’s site and listened to video clips announcing Siri. The voice was unmistakably hers.
“Oh, I knew,” she said. “It’s obviously me. It’s my voice.”
It certainly does sound like Bennett. But proving who supplied the voice of Siri isn’t easy. It’s not like Steve Jobs sent Bennett a thank-you note, or a certificate to hang on her wall.
There are others who vouch for her. But the tech world – and specifically the text-to-speech, or TTS, space – is a complicated business, one that’s shrouded in secrecy and entangled in a web of nondisclosure agreements.
Bennett is not bound by such restrictions, which is why she’s talking. But the industry has a vested interest in keeping their voices anonymous.
“The companies are competing to create the best-sounding and functioning systems. Their concern is driving revenues,” said Marcus Graham, CEO of GM Voices. “Talking about the voice talent, from their perspective, is likely seen as a distraction.”
Bennett’s attorney, Steve Sidman, can’t breach attorney-client privilege to share documents and contracts, but since he began representing Bennett in 2012 he’s been intensely aware of her connection to Siri.
“I’ve engaged in substantial negotiations – multiple, months-long negotiations – with parties along the economic food chain, so to speak, that involved her rendering services as the voice of Siri,” he told CNN. “It’s as simple as that.”
And then there’s Graham, of GM Voices, a man who has built a career around providing voiceover talent for interactive voice technologies.
Graham won’t divulge details about any deals he made back in 2005. But he has worked with Bennett for 25 years, has recorded “literally millions of words with Susan” and has installed her voice with clients across the globe. He knows her voice as well as anyone, and he doesn’t hesitate when asked if she and Siri are the same.
“Most female voices are kind of thin, but she’s got a rich, full voice,” he said. “Yes, she’s the voice of Siri. … She’s definitely the voice.”
A ‘100% match’
In October 2005, a few months after Bennett made those recordings, ScanSoft bought and took on the name of Nuance Communications. Nuance is the company widely accepted to have provided to Apple the technology behind Siri.
When CNN contacted Nuance to try and confirm Bennett’s identity as a voice of Siri, a Nuance spokeswoman said, “As a company, we don’t comment on Apple.”
Apple, too, declined to comment.
So CNN took the investigation one step further by hiring an audio forensics expert to compare Bennett’s voice with Siri’s.
Ed Primeau, of Rochester Hills, Michigan, has been doing this work for three decades. He’s testified in courts, analyzed “hundreds, if not thousands” of recordings and is a member of the American Board of Recorded Evidence. He spent four hours studying our “known voice” – in this case Siri – with the unknown voice of Bennett.
“I believe, and I’ve lived this for 30 years, no two voices are the same,” he said, after finishing his analysis of the Siri voice and Bennett’s. “They are identical – a 100% match.”
To reach his conclusion Primeau created back-to-back comparison files, lifted and listened to consonants and reviewed deliveries. He took the hiss off the Siri sound, created in recording from a phone, and dropped it into Bennett’s file.
After studying Bennett’s normal speaking voice, he was about 70% certain of the match. But once he had audio of her saying the same words as Siri, he knew his work was done. Even so, he said he asked a colleague for a second opinion.