When her phone vibrates with a WhatsApp alert from her “Task Hunters” group, she has little time to react.

Fuentes, 35, rushes to her computer and logs on to Appen, an artificial-intelligence data platform where she has been tagging data for the past decade. She works quickly as she competes with thousands of other crowd-workers for 5–25 cents per task. With each click, she may choose the genre of a movie, decide if an image is AI-generated, or solve a math problem.

Fuentes is among the hundreds of thousands of Venezuelans who do informal work for the tech industry. As Venezuela’s economic crisis worsened and its currency became nearly worthless around 2018, educated Venezuelans signed up on AI-training and freelancing platforms to earn in U.S. dollars. They formed up to 75% of the workforce at companies like Mighty AI and Scale AI in 2018. Remotasks even created a special program to attract Venezuelan workers.

They annotated all kinds of data to train AI tools, such as vision models, autonomous vehicles, and warehousing robots. They also moderated violent content and wrote articles to optimize websites for search.

But with the rise of generative AI, such digital jobs have become scarce and poorly paid, workers and researchers told Rest of World. Without formal contracts, the workers have little choice but to find ways to compete with AI, or quit.

    • Miaou@jlai.lu
      link
      fedilink
      arrow-up
      9
      ·
      9 days ago

      Confusing article, or the writer has no damn clue what those jobs were for in the first place.

    • Jrockwar@feddit.uk
      link
      fedilink
      arrow-up
      6
      ·
      edit-2
      9 days ago

      This can be correct, if they’re talking about training smaller models.

      Imagine this case. You are an automotive manufacturer that uses ML to detect pedestrians, vehicles, etc with cameras. Like what Tesla does, for example. This needs to be done with a small, relatively low power footprint model that can run in a car, not a datacentre. To improve its performance you need to finetune it with labelled data of traffic situations with pedestrians, vehicles, etc. That labeling would be done manually…

      … except when we get to a point where the latest Gemini/LLAMA/GPT/Whatever, which is so beefy that could never be run in that low power application… is also beefy enough to accurately classify and label the things that the smaller model needs to get trained.

      It’s like an older sibling teaching a small kid how to do sums, not an actual maths teacher but does the job and a lot cheaper or semi-free.