Thousands of people are selling their identities to train AI – but at what cost?

6 hours ago 11

One morning last year, Jacobus Louw set out on his daily neighborhood walk to feed the seagulls he finds along the way. Except this time, he recorded several videos of his feet and the view as he walked on the pavement. The video earned him $14, about 10 times the country’s minimum wage, or for Louw, a 27-year-old based in Cape Town, South Africa, half a week’s worth of groceries.

The video was for an “Urban Navigation” task Louw found on Kled AI, an app that pays contributors for uploading their data, such as videos and photos, to train artificial intelligence models. In a couple of weeks, Louw made $50 by uploading pictures and videos of his everyday life.

Thousands of miles away in Ranchi, India, Sahil Tigga, a 22-year-old student, regularly earns money by letting Silencio, which crowdsources audio data for AI training, access his phone’s microphone to capture ambient city noise, such as inside a restaurant or traffic at a busy junction. He also uploads recordings of his voice. Sahil travels to capture unique settings, like hotel lobbies not yet documented on Silencio’s map. He earns over $100 a month doing this, enough to cover all his food expenses.

And in Chicago, Ramelio Hill, an 18-year-old welding apprentice, made a couple hundred dollars by selling his private phone chats with friends and family to Neon Mobile, a conversational AI training platform that pays $0.50 per minute. For Hill, the calculation was simple: he figured tech companies already capture so much of his private data, so he might as well get a cut of the profit.

These gig AI trainers – who upload everything from scenes around them to photos, videos and audio of themselves – are at the frontlines of a new global data gold rush. As Silicon Valley’s hunger for high-quality, human-grade data outpaces what can be scraped from the open internet, a thriving industry of data marketplaces has emerged to bridge the gap. From Cape Town to Chicago, thousands of people are now micro-licensing their biometric identities and intimate data to train the next generation of AI.

But this new gig economy comes with trade-offs. In exchange for a few dollars, its trainers are fueling an industry that may eventually render their skills obsolete, while leaving some of them vulnerable to a future of deepfakes, identity theft and digital exploitation that they are only just beginning to understand.

Keeping the AI wheel spinning

AI’s language models, such as ChatGPT and Gemini, demand vast troves of learning material to improve, but they’re facing a data drought. The most used training sources, such as C4, RefinedWeb and Dolma, which account for a quarter of the highest-quality datasets on the web, are now restricting generative AI companies from training models with their data. Researchers estimate AI companies will run out of fresh high-quality text to train on as soon as 2026. While some labs have resorted to feeding back the synthetic data their AI generates, such a recursive process can lead models to produce error-filled slop that causes their collapse.

Overhead view of a hand on a keyboard and a hand on a computer mouse in front of a monitor with a phone to the side
Gig AI trainers, who upload everything from scenes around them to photos, videos, and audio of themselves, are at the frontlines of a new global data gold rush. Photograph: Arun Sankar/AFP via Getty Images

This is where apps such as Kled AI and Silencio step in. On these kinds of data marketplaces, millions are monetizing their identities to feed and train AI. Beyond Kled AI, Silencio and Neon Mobile, there are many options for AI trainers: Luel AI, backed by famed startup incubator Y-Combinator, sources multilingual conversations for about $0.15 a minute. ElevenLabs allows you to digitally clone your voice and let anyone use it for a base fee of $0.02 a minute.

Gig AI training is a new emerging category of work, and it will grow substantially, said Bouke Klein Teeselink, an economics professor at King’s College London.

AI companies know that paying people to license their data helps avoid the risk of copyright disputes they could face if they relied entirely on content scraped from the web, Tesselink said. These companies also need high-quality data in order to model new, improved behaviours in their systems, said Veniamin Veselovsky, an AI researcher. “Human data, for now, is the gold standard to sample from outside of the distribution of the model,” Veselovsky added.

The humans fueling the machines, particularly those in developing countries, often need the money and have few other options for earning it. For many gig AI trainers, doing this work is a pragmatic response to economic disparity. In countries with high unemployment and devalued currencies, earning US currency is often more stable and rewarding than local jobs. Some of them struggle to secure entry-level jobs, and do AI training out of necessity. Even in wealthier nations, the rising cost of living has turned selling oneself into a logical financial pivot.

However, the pitfalls of gig AI training can be invisible. On some AI marketplaces, data trainers grant irrevocable, royalty-free licenses that allow companies to create “derivative works”, meaning a 20-minute voice recording today could power an AI customer service bot for the next few years, with the trainer never seeing another cent. Plus, due to the lack of transparency in these marketplaces, a user’s face could end up in a facial recognition database or a predatory advertisement half a world away, with virtually no legal recourse.

Louw, the AI trainer in Cape Town, is aware of the privacy trade-offs. And though the income is erratic and not sufficient to cover his full monthly expenses, he is willing to accept these conditions to earn money. He struggled with a nervous disorder for years and couldn’t secure a job, but money earned on AI marketplaces, including Kled AI, allowed him to save up for a $500 spa training course to become a masseur.

“As a South African, being paid in USD is more worth it than people think,” Louw said.

Mark Graham, a professor of internet geography at the University of Oxford and author of Feeding the Machine, acknowledged that for individuals in developing countries, the money can be meaningful in the short term, but warned that “structurally this work is precarious, non-progressive and effectively a dead end”.

AI marketplaces rely on a “race to the bottom in wages”, added Graham, and a “temporary demand for human data”. Once this demand shifts, “workers are left with no protections, no transferable skills, and no safety net”.

The only winner that emerges, Graham said, are “the platforms in the global north [that] capture all the enduring value”.

an aerial view of a coastal city
Cape Town, South Africa. Photograph: Peter Titmuss/Universal Images Group/Getty Images

Carte blanche permissions

Hill, the Chicago-based AI trainer, had conflicting feelings about selling his private phone calls to Neon Mobile. For about 11 hours of calls, he earned $200, but he sayid the app would frequently go offline and fail to release overdue payments. “Neon was always shady to me, but I kept using it to get some extra, easy money for bills and other miscellaneous expenses,” said Hill.

Now he’s reconsidering how easy that money was. In September, just weeks after it had launched, Neon Mobile went offline after TechCrunch discovered a security flaw that allowed anyone to access the phone numbers, call recordings and transcripts of users. Hill said Neon Mobile never informed him about this, and now he’s worried how his voice may be misused on the internet.

What Jennifer King, a data privacy researcher at the Stanford Institute for Human-Centered Artificial Intelligence, finds concerning is that AI marketplaces are unclear about how and where users’ data will be deployed. Without negotiating or knowing their rights, she added, “consumers run a risk of their data being repurposed in ways that they don’t like or didn’t understand or anticipate, and they’ll have little recourse if so”.

When AI trainers share their data on Neon Mobile and Kled AI, they’re granting a carte blanche license (worldwide, exclusive, irrevocable, transferable and royalty-free) to sell, use, publicly display and store their likeness – and even create derivative works of them.

Kled AI’s founder, Avi Patel, said his company’s data agreements limit use to AI training and research purposes. “The entire business depends on user trust. If contributors believe their data could be misused, the platform stops working.” He said his company vets businesses before selling datasets, to avoid working with those with “questionable intent”, such as pornography, and “government bodies” that they believe could use the data in ways that conflict with that trust.

Neon Mobile did not respond to a request for comment.

According to Enrico Bonadio, a law professor at City St George’s, University of London, the terms of these agreements permit the platforms, as well as its clients, to do “almost anything with that material, forever, with no further payment and no realistic way for the contributor to withdraw consent or meaningfully renegotiate”.

More troubling risks include trainers’ data being used for deepfakes and impersonation. Even though data marketplaces claim to strip the data of any identification, like name and location, before selling it, biometric patterns are, by nature, hard to anonymise in a robust sense, added Bonadio.

Seller’s regret

Even when AI trainers are able to negotiate more nuanced protections for how their data will be used, they can still feel regret. When Adam Coy, an actor from New York, sold his likeness in 2024 for $1,000 to Captions, an AI-powered video editor that’s now called Mirage, his agreement ensured his identity wouldn’t be used for any political means or for selling alcohol, tobacco or pornography, and that the license would expire in a year.

Captions did not respond to a request for comment.

Not long after, Adam’s friends started forwarding him videos they’d found online featuring his face and voice garnering millions of views. In one of these videos, an Instagram reel, Adam’s AI replica claims to be a “vagina doctor” and promotes unproven medical supplements for pregnant and postpartum women.

“It felt embarrassing to explain it to people,” Coy said.

“The comments are strange to read because they comment on my physical appearance, but it’s not really me,” Coy added. “My feeling [while deciding to sell my likeness] was that most models were going to be scraping the internet for data and likeness [anyway], so may as well be paid for it.”

Coy said he hasn’t signed up for any AI data gigs since. He’d only consider it, he said, if a company offered major compensation.

Read Entire Article
Infrastruktur | | | |