Featured Sponsor
Store | Link | Sample Product |
---|---|---|
UK Artful Impressions | Premiere Etsy Store |
Like a turkey dozing off when talk turns to Christmas, I confess to tuning out when talk turns to AI. Or rather I used to, until a few weeks ago. Before then, AI seemed vital and foreboding, yet somehow also remote and incomprehensible. But now my attention is hooked. The difference lies in the no-longer-unique sound of the human voice.
Deepfake vocal clones are here. The technology behind them isnβt new, but rapid advances in accuracy and availability have made AI-generated voice copying go viral this year. Microsoftβs Vall-E software claims to be able to mimic a person based on just three seconds of audio. Although it hasnβt yet been released to the public, others with similarly powerful capabilities are easily obtained.
A flashpoint came in January when tech start-up ElevenLabs released a powerful online vocal generator. Faked voices of celebrities immediately flooded social media. Swifties on TikTok concocted imaginary inspirational messages from Taylor Swift (βHey itβs Taylor, if youβre having a bad day just know that you are lovedβ). At the other end of the spectrum, 4chan trolls created fake audio clips of celebrities saying hateful things.
Other voice generators duplicate singing as well as speech. Among the countless mock-ups circulating on social media is a synthetic but convincing-sounding Rihanna covering BeyoncΓ©βs βCuff Itβ. Digitally resurrected foes Biggie Smalls and Tupac Shakur make peace in a jointly rapped version of Kanye West and Jay-Zβs βN****s in Parisβ. David Guetta made an AI Eminem voice rapping about a βfuture rave soundβ for a live DJ set. Referring to what he called his βEmin-AI-emβ creation, he explained afterwards that βobviously I wonβt release this commerciallyβ.
In April, a track called βHeart on My Sleeveβ became the first voice-clone hit, notching up millions of streams and views. Purportedly made by a mysterious figure called Ghostwriter, itβs a duet featuring AI-generated versions of Canadian superstars Drake and The Weeknd.Β
The lyrics resemble a bad parody of the pairβs real work. βI got my heart on my sleeve with a knife in my back, whatβs with that?β the fake Drake raps, evidently as mystified as the rest of us. But the verisimilitude of the vocals is impressive. So realistic are they that there has been groundless speculation that the whole thing is a wormhole publicity stunt in which the two acts are supposedly pretending to be their AI-created avatars.Β
βHeart on My Sleeveβ was removed from streaming platforms after a complaint from the artistsβ label, Universal Music Group, although itβs simple enough to find online. A murky legal haze covers vocal cloning. The sound of a singerβs voice, its timbre, doesnβt have the same protection in law as the words and melodies theyβre singing. Their voice might be their prize asset, but its sonic frequency isnβt theirs to copyright. Depending on its use, it appears that I am at liberty to make, or try to make, an AI model of my favourite singerβs inimitable tones.Β
Unlike the famous rappers and pop stars who are the typical targets for cloning, my choice is a vintage act: Tom Waits, a gravelly mainstay of my musical life since my student days.
Now 73, the Californian singer-songwriter released his first album 50 years ago. His songs have been succinctly characterised by his wife and collaborator Kathleen Brennan as either βgrim reapersβ that clank and snarl and brawl or βgrand weepersβ that serenade and bawl. Take note, AI Drake and AI The Weeknd, this is real heart-on-sleeve stuff.
Aside from my being a fan, a reason to pick him is his distinctive singing style, a cataract roar to rival Niagara Falls. Another is the frustrating absence of any new music from him: his most recent album came out in 2011. I therefore set myself the challenge of using online generative tools to create a surrogate for the real thing, a new song that will endeavour to put the AI into Tom Waits.Β
As with any unfamiliar task these days, the first port of call is a YouTube tutorial. There I find a baseball-hatted tech expert from the US, Roberto Nickson, demonstrating the power of voice generators with an uncanny Kanye West impression that went viral at the end of March. He chose the rapperβs voice because heβs a fan, but also as it was the best voice model that he could find at the time.
Set to a Ye-style beat that he found on YouTube, Nicksonβs Ye-voiced verses make the rapper seem to apologise for his shocking antisemitic outbursts last year. βI attacked a whole religion all because of my ignorance,β Nickson raps in the vocal guise of Kanye. (In reality, the rapper offered a sorry-not-sorry apology last year in which he said he didnβt regret his comments.)
βWhen I made that video, these machine-learning models were brand new,β Nickson tells me in a video call, sitting behind a microphone in his filming studio in Charlotte, North Carolina. The 37-year-old is a tech entrepreneur and content creator. He came across the Kanye voice model while browsing a Ye-inspired music-remix forum called Yedits on the internet site Reddit.Β
βIt was a novelty, no one had seen it,β he says of the AI-generated Ye voice. βLike, the tutorial had about 20 views on YouTube. And I looked at it and went, βOh my God.β The reason I knew it was going to be huge wasnβt just that it was novel and cool, but also because the copyright conversation around it is going to change everything.βΒ
Ethical questions are also raised by voice cloning. Nickson, who isnβt African-American, was criticised online for using a black American voice. βI had a lot of comments calling it digital blackface. I was trying to explain to people, hey look, at the time this was the only good model available.β
Elsewhere on his YouTube channel are guides to making your own celebrity voice. Led by his tutorials, I enrol as a member of an AI hub on Discord, the social-media platform founded by computer gamers. There you can find vocal models and links to the programming tools for processing them.Β
These tools have abstruse names like βso-vits-svcβ and initially look bewildering, though itβs possible to use them without programming experience. The voice models are formulated from a cappella vocals taken from recordings, which are turned into sets of data. It takes several hours of processing to create a convincing musical voice. Modellers refer to this as βtrainingβ, as though the vocal clone were a pet.Β
Amid the Travis Scotts and Bad Bunnies on the Discord hub is a Tom Waits voice. Itβs demonstrated by a clip of the AI-generated Waits bellowing a semi-plausible version of Lil Nas Xβs country-rap hit βOld Town Roadβ. But I canβt make the model work. So my next port of call is a website to do it for me.
Voicify.ai creates voices for users. It was set up by Aditya Bansal, a computer science student at Southampton University. He noticed AI cover songs mushrooming and within a week had his website up and running. Speed is of the essence in a gold rush.Β
βBecause the tech is quite new, thereβs a lot of people working on it and trying to get a product out, so I had to do it quickly,β the 20-year-old says by video call. He has made an AI voice for himself, in the style of the deceased American rapper Juice Wrld, βbut my singing voice isnβt good so I canβt reach the notes.β (As I will learn, a degree of musical talent is needed in the world of AI-generated songcraft.)
When we speak, Bansal is a week away from second-year exams for which he hasnβt yet started revising. With payment tiers ranging from Β£8.99 to Β£89.99, Voicify.ai is proving a lucrative distraction. βIt started off pretty much US/UK,β he says of its users. βNow Iβve seen it go worldwide.β Record labels have also contacted him, wanting to make models of their artists for demo tracks, which are used as sketches before the full recording process.
He wonβt put an exact figure on his earnings but his laugh carries a disbelieving note when I ask. βItβs a lot,β he says, with a smile shading from bashful to gleeful.
To create my voice, I go to another site to extract a cappella sound files of Waits singing tracks from his album Rain Dogs, which I then feed into Voicify.ai. Several hours later, my AI Waits is ready. I test it with Abbaβs βDancing Queenβ, an MP3 of which I drag-and-drop into the website.
The song re-emerges with the Abba vocals replaced by the AI-generated Waits voice. It starts in a rather wobbly way, as if the Waits-bot is flummoxed by the assignment. But by the time it reaches βFriday night and the lights are lowβ, itβs bellowing away with full-throated commitment. It really does sound like Tom Waits covering Abba. Next comes the trickier hurdle of making a new song.Β
One possible obstacle is the law. In 1990, Waits won a landmark court case in the US against Frito-Lay, manufacturers of Doritos corn chips, for using a gruff-voiced impersonator in an advertisement. Could the same apply to AI vocal clones? The Recording Industry Association of America argues that algorithmic voice training infringes on artistsβ copyright as it involves their recordings, like my use of Rain Dogsβ songs. But that can be countered by fair use arguments that protect parodies and imitations.Β
βIf we do get a court case, it will come to whether youβre trying to make money from it, or is it a viral parody that youβre doing for legitimate purposes?β reckons Dr Luke McDonagh of the London School of Economics, an expert on intellectual property rights and the arts. βIf youβre doing it to make money, then the law will stop you because youβre essentially free-riding on the brand image, the voice of someone elseβs personality. It will be caught by the law in some way, but itβs not necessarily a matter for copyright.β
Alas β but perhaps happily from the point of view of legal fees β my AI Waits impression will not trigger a definitive voice-clone update of Waits vs Frito-Lay. The reason lies not in the dense thickets of jurisprudence, but rather the woefulness of my attempted AI-assisted mimicry.Β
To get lyrics I go to ChatGPT, the AI chatbot released last November by research laboratory OpenAI. It responds to my query for a song in the style of Tom Waits with a game but facepalmy number called βGritty Troubadourβs Backstreetβ.Β
βThe piano keys are worn and weary,/As he pounds them with a weathered hand,/The smoke curls βround his whiskey glass,/A prophet of a forgotten land,β runs a verse. This clunky pastiche, produced with incredible speed from analysing Waitsian lyrical matter contained on the internet, conforms to the grand weepie side of the singerβs oeuvre.
For the tune, I turn to Boomy, an AI music creator. Since launching in California in 2019, it claims to have generated more than 15mn songs, which it calculates as 14 per cent of the worldβs recorded music. Earlier this month, Spotify was reported to have purged tens of thousands of Boomy-made songs from its catalogue following accusations about bots swarming the site to artificially boost streaming numbers.Β
My additions to Boomyβs immense pile of songs are undistinguished. To create a track, you pick a style, such as βlo-fiβ or βglobal grooveβ, and then set basic parameters, like the drum sound and tempo. There isnβt an option to select the style of a named artist. After fiddling with it to make the music as jazzy as possible, I end up with an odd beat-driven thing with a twangy bass.Β
Thereβs a button for adding vocals. To my mortification, I find myself hollering βGritty Troubadourβs Backstreetβ in my gruffest voice over the weird Boomy music at my computer. Then itβs back to Voicify.ai to Waits-ify the song. The results are a monstrosity. My Waits voice sounds like a hoarse English numpty enunciating doggerel. My experiment with AI voice generation has been undone by a human flaw: I canβt sing.
You need musical skill to make an AI song. The voice clones require a real person to sing the tune or rap the words. When a UK rock band called Breezer released an imaginary Oasis album last month under the name βAisisβ, they used a voice clone to copy Liam Gallagher but wrote and performed the songs themselves. βI sound mega,β the real Gallagher tweeted after hearing it.
Artists are divided. Electronic musician Grimes, a committed technologist, is creating her own voice-mimicking software for fans to use provided they split royalty earnings with her. In contrast, Sting recently issued an old-guard warning about the βbattleβ to defend βour human capital against AIβ. After a vocal double imitated him covering a song by female rapper Ice Spice, Drake wrote on Instagram, with masculine pique: βThis the final straw AIβ.
βPeople are right to be concerned,β Holly Herndon states. The Berlin-based US electronic musician is an innovative figure in computer music who used a custom-made AI recording system for her 2019 album Proto. Her most recent recording is a charmingly mellifluous duet with a digital twin, Holly+, in which they cover Dolly Partonβs tale of obsessive romantic rivalry, βJoleneβ.Β
Holly+βs voice was cloned from recordings of Herndon singing and speaking. βThe first time I heard my husband [artist and musician Mat Dryhurst] sing through my voice in real time, which was always our goal, was very striking and memorable,β she says by email. The cloned voice has been made available for public use, though not as a free-for-all: a βclear protocol of attributionβ, in Herndonβs words, regulates usage. βI think being permissive with the voice in my circumstance makes the most sense, because there is no way to put this technology back in the box,β she explains.Β
Almost every stage of technological development in the history of recorded music has been accompanied by dire forecasts of doom. The rise of radio in the 1920s provoked anxiety about live music being undermined. The spread of drum machines in the 1980s was nervously observed by drummers, who feared landing with a tinny and terminal thump on the scrap heap. In neither case were these predictions proved correct.
βDrumming is still thriving,β Herndon says. βSome artists became virtuosic with drum machines, synths and samplers, and we pay attention to the people who can do things with them that are expressive or impressive in ways that are hard for anyone to achieve. The same will be true for AI tools.β
Pop music is the medium that has lavished the most imaginative resources on the sound of the voice over the past century. Since the adoption of electric microphones in recording studios in 1925, singers have been treated as the focal point in records, like Hollywood stars in close-up on the screen. Their vocals are designed to get inside our heads. Yet famous singers are also far away, secreted behind their barrier of celebrity. Intimacy is united with inaccessibility.Β
Thatβs why pop stars command huge social media followings. Itβs also why their fans are currently running amok with AI voice-generating technology. The ability to make your idol sing or speak takes popβs illusion of closeness to the logical next level. But the possessors of the worldβs most famous voices can take comfort. For all AIβs deepfakery, the missing ingredient in any successful act of mimicry remains good old-fashioned talent β at least for now.
Ludovic Hunter-Tilney is the FTβs pop critic
Find out about our latest stories first β follow @ftweekend on Twitter
—————————————————-
Source link
We’re happy to share our sponsored content because that’s how we monetize our site!
Article | Link |
---|---|
UK Artful Impressions | Premiere Etsy Store |
Sponsored Content | View |
ASUS Vivobook Review | View |
Ted Lassoβs MacBook Guide | View |
Alpilean Energy Boost | View |
Japanese Weight Loss | View |
MacBook Air i3 vs i5 | View |
Liberty Shield | View |