How easy is it to clone Prime Minister Narendra Modi's voice? Very. All you need is a laptop.
In an era where technology continually reshapes our digital landscape, one innovation that has quietly emerged at the forefront of human-computer interaction, promising unprecedented impacts is AI voice cloning.
As we witness AI streamlining complex procedures of cloning a voice to a single click, the technology challenges our perceptions of what's real while raising important questions about privacy and authenticity. A multitude of AI-powered voice cloning software options have now made it feasible to replicate the voices of both the living and the deceased in a matter of minutes.
Decode tried its hands on a few of these software which are available both as free trials and paid versions. Although the cloned voice was discernible, we were able to clone Prime Minister Narendra Modi's voice within seconds. As scary as it may sound, now any average person with little to no technical know-how can create a cloned voice of anybody. Well, this holds significant implications as we approach the upcoming elections, a period particularly susceptible to the proliferation of misinformation.
As these technologies get better and better, it gives politicians leeway to call everything a deepfake.
Voice cloning made easy
Play.ht is a text-to-speech service, which can be used to train the AI model in any voice. The software could generate realistic human speech from written text in just a few minutes. Play.ht’s free plan provides a plausible voice clone, however, its upgraded plan which starts at $39 per month, promises to clone a voice "encapsulating every dialect and accent".
Started as a chrome extension in 2016, the startup gradually developed into a AI-driven service with its current library comprising of 100+ languages and accents. Some of these include American English, British English, Australian English, Hindi, German, Chinese, Filipino, Turkish and many more. In its 'Ethical AI & Safety' section, its website simply states, "We are dedicated to ensuring our Voice AI is used responsibly and safely."
Another text-to-speech AI software which goes by the name of iMyFone VoxBox, provides an already available assortment of cloned voices of famous personalities, besides facilitating an AI voice cloning feature. These include the likes of Narendra Modi, Joe Biden, Donald Trump, Putin, Mahatma Gandhi, Morgan Freeman, Joe Rogen, Kanye West and many more.
With operations in 249 countries, the software has currently over 320 million downloads, as per its website. Putting the onus on the user, the website states that "the users will take full responsibility, liability, and blame for any libel or litigation that results from something written in or as a direct result of something written in a comment (text prompt)."
Apart from these easily available AI services online, there is another place that can help one to generate AI cloned voices, free of cost and rather easily-- YouTube. Amidst the thousands "How To" tutorials available on the platform, finding one that teaches you to clone voices with AI, shouldn't come as a surprise.
Decode found more than a few content creators on YouTube who shared links to Google Colab documents in the description of their tutorial videos explaining the process of creating AI voice clones in just a few seconds. How does this work?
Google Colab, short for Google Colaboratory, is a free online platform provided by Google that allows you to write, run, and share code with others. It is particularly useful for people who want to work with code and data in a collaborative and cloud-based environment.
The links provided in the YouTube description for Google Colab lead to a code environment where one has to follow a few steps to clone a voice. First, download the recommended installations, then enter the desired prompt, and finally upload the voice that needs to be cloned.
'Personalised messages' using AI
AI voice cloning has already seeped into the political communication as we approach the election season. Five Indian states, Rajasthan, Madhya Pradesh, Chhattisgarh, Telangana and Mizoram, are set to elect new legislatures this month, ahead of national elections due next year.
Divyendra Singh Jadoun's company Polymath Solutions, is leveraging this technology to create "personalised messages" for Congress party workers involved in the preparations for the forthcoming Rajasthan assembly elections. "In politics, every party worker is contributing their efforts voluntarily, what they truly desire is acknowledgment and appreciation from their political leaders," he said.
Jadoun receives the short script from the political party which is then integrated into AI voice model that sends personalised voice notes on WhatsApp in the voices of politicians. Apart from Congress, the incumbent BJP has also approached Jadoun's company for the same work. "We are still in talks with the party and will be providing the same service to them before the Lok Sabha elections of 2024," he said.
Talking about the ethical concerns associated with his work, Jadoun said that we do not clone a voice until the political leader gives their consent. "Currently we will be only working for personalised messaging for party workers and not the voter base. We are mulling over the script in case we undertake voter base messaging, as we will be cautious of maintaining electoral decorum," he said.
AI and Politics: The menace of voice cloning
Speaking to Decode, Deepak P., associate professor at Queen's University Belfast, UK, said that the main danger of using AI in politics is that it could sway electoral power. He shed light on two significant implications. One is where voice cloning is done to personalise the messaging to parts of the electorate. "Imagine receiving a call or message from a national leader addressing local concerns in your language. Familiar national leaders' voices can pique curiosity, leading people to listen to automated messages, even if they habitually ignore them. This can inadvertently promote false promises," he said.
According to him, the second is when this voice cloning technology will be used to show a leader of the other party in bad light. "In the highly communalised socio-political environment of India, even a short 10 second clip would be enough to whip up sectarian divides to polarise elections to the advantage of communal forces," he said.
Sachin Kalbag, senior fellow at the Takshashila Institution, told Decode how AI-driven software possesses unprecedented capabilities to disseminate disinformation rapidly and with significant consequences, unlike any method used in the past. He said, "It could potentially lead to violence and even death. In the context of elections, propaganda reigns supreme, and if a party decides to throw morals to the side and clone the voices of its opponents, millions of people would have watched it before any action could be initiated."
According to Kalbag, AI voice cloning software can produce misinformation at a significantly faster rate compared to the time required by fact checkers to debunk it. "The damage would not only have been done, its impact would be so widespread that there could be violence, as has happened repeatedly in Manipur this year," he said.
Besides, as Kalbag says, people don't receive fact-checked articles with the same enthusiasm as any video that validates their existing beliefs or biases. "It is a cognitive mind trick, and propagandists know this," he added.
Talking about potential ways of mitigating such risks, Deepak believes that the enthusiasm about AI solutions catching AI-generated misinformation is heavily misplaced. Explaining this, he said, "In simple terms, an in-house voice cloning detector is already part of the development of a voice cloning AI. The development of the voice cloning AI is considered successful when it can deceive the detector. Therefore, creating an effective voice cloning detector can enhance and refine the voice cloning AI by challenging it to outsmart the detector."
Furthermore, as per Kalbag, one cannot halt the march of technology. "Regulatory bodies and governments trying to ban or control such AI-based software will push it underground and make it even more desirable to those who want to damage the fabric of our society," he said.