What Happened When I Cloned My Own Voice

Recently my colleague Charlie Warzel, who covers technology, introduced me to the most sophisticated voice-cloning software available. It had already been used to clone President Joe Biden’s voice to create a fake robocall discouraging people from voting in the New Hampshire primary. I signed up and fed it a few hours of me speaking on various podcasts, and waited for the Hanna Rosin clone to be born. The way it works is you type a sentence into a box. For example, Please give me your Social Security number, or Jojo Siwa has such great fashion!, and then your manufactured voice, created from samples of your actual voice, says the sentence back to you. You can make yourself say anything, and shift the intensity of the intonation until it sounds uncannily like you.

Warzel visited the small company that made the software, and what he found was a familiar Silicon Valley story. The people at this company are dreamers, inspired by the Babel fish, a fictional translation device, from The Hitchhiker’s Guide to the Galaxy. They imagine a world where people can speak to one another across languages and still sound like themselves. Warzel spoke to them about the less dreamy possibilities of voice cloning software: scams, misinformation, and election interference. And he came away with the impression that they were aware of the dangers. But once the technology is out, nobody can quite predict every variety of world-altering chaos, particularly in a year when over half the world’s population will undergo an election.

In this episode of Radio Atlantic, Warzel and I discuss how this small company perfected the cloned voice, and what good and bad actors might do with it. Warzel and I spoke at a live show in Seattle, which allowed us to play a few tricks with the audience.

Hanna Rosin: So a few weeks ago, my colleague staff writer Charlie Warzel introduced me to something that’s either amazing or sinister—probably both.

Charlie’s been on the show before. He writes about technology. And most recently, he wrote about AI voice software. And I have to say: It’s uncannily good. I signed up for it—uploaded my voice—and man does it sound like me.

So, of course, what immediately occurred to me was all the different flavors of chaos this could cause in our future.

I’m Hanna Rosin. This is Radio Atlantic. And this past weekend, I was in Seattle, Washington, for the Cascade PBS Ideas Festival. It’s a gathering of journalists and creators and we discussed topics from homelessness, to the Supreme Court, to the obsession with true crime.

Charlie and I talked about this new voice software. And we tried to see if the AI voices would fool the audience.

For this week’s episode, we bring you a live taping with me and Charlie. Here’s our conversation.

Rosin: So today we’re going to talk about AI. We’re all aware that there’s this thing barreling towards us called AI that’s going to lead to huge changes in our world. You’ve probably heard something, seen something about deep fakes. And then the next big word I want to put in the room is election interference.

Today, we’re going to connect the dots between those three big ideas and bring them a little closer to us because there are two important truths that you need to know about this coming year. One is that it is extremely easy—by which I mean ten-dollars-a-month easy—to clone your own voice, and possibly anybody’s voice, well enough to fool your mother. Now, why do I know this? Because I cloned my voice, and I fooled my mother. And I also fooled my partner, and I fooled my son. You can clone your voice so well now that it really, really, really sounds a lot like you or the other person. And the second fact that it’s important to know about this year is that about half the world’s population is about to undergo an election.

So those two facts together can lead to some chaos. And that’s something Charlie’s been following for a while. Now, we’ve already had our first taste of AI-voice election chaos. That came in the Democratic primary. Charlie, tell us what happened there.

Charlie Warzel: A bunch of New Hampshire voters—I think it was about 5,000 people—got a phone call, and it would say “robocall” when you pick it up, which is standard if you live in a state doing a primary. And the voice on the other end of the line was this kind of grainy-but-real-sounding voice of Joe Biden urging people not to go out and vote in the primary that was coming up on Tuesday.

Rosin: Let’s, before we keep talking about it, listen to the robocall. Okay? We’re going to play it.

Joe Biden (AI): Republicans have been trying to push nonpartisan and Democratic voters to participate in their primary. What a bunch of malarkey. We know the value of voting Democratic when our votes count. It’s important that you save your vote for the November election. We’ll need your help in electing Democrats up and down the ticket. Voting this Tuesday only enables the Republicans in their quest to elect Donald Trump again. Your vote makes a difference in November, not this Tuesday.

Rosin: I’m feeling like some of you are dubious, like that doesn’t sound like Joe Biden. Clap if you think that does not sound like Joe Biden.

Rosin: Well, okay. Somewhere in there. So when you heard that call, did you think, Uh-oh. Here it comes? Like, what was the lesson you took from that call? Or did you think, Oh, this got solved in a second and so we don’t have to worry about it?

Warzel: When I saw this, I was actually reporting out a feature for The Atlantic about the company ElevenLabs, whose technology was used to make that phone call. So it was very resonant for me.

You know, when I started writing—I’ve been writing about deep fakes and things like that for quite a while (I mean, in internet time), since 2017. But there’s always been this feeling of, you know, What is the actual level of concern that I should have here? Like, What is theoretical? With technology and especially with misinformation stuff, we tend to, you know, talk and freak out about the theoretical so much that sometimes we’re not really talking about and thinking, grounding it in plausibility.

So with this, I was actually trying to get a sense of: Is this something that would actually have any real sway in the primary? Like, did people believe it? Right? It’s sort of what you just asked the audience, which is: Is this plausible? And I think when you’re sitting here, listening to this with hindsight, and, you know, trying to evaluate, that’s one thing.

Are you really gonna question, like, at this moment in time, if you’re getting that, especially if you aren’t paying close attention to technology—are you really gonna be thinking about that? This software is still working out some of the kinks, but I think the believability has crossed this threshold that is alarming.