About your first point: think of it like inbreeding, you need fresh genes on the pool or mutations occur.
A generative model will generate some relevant results and some non relevant results, it’s the job of humans to curate that.
However, the more content the llm generates, it is used on the web and thus becomes part of it’s training data.
Imagine that 95% of results are accurate, from those only 1% doesn’t get fact checked and gets released into the internet where other humans will complain, but that will be used as input of an llm regardless. Anyway, so we have a 99% accuracy in the next input, and only 95% of that will be accurate.
It’s literally a sequence that will reach very innacurate values very fast:
f(1) = 1
f(x_n) = x_n-1 * 0.95
You can mitigate it by not training it on generated data, but as long as AI content replaces genuine content, specially with images, AI will train itself from its own output and it will degenerate fast.
About the second point, you can pay artists to train models, sure, but that’s not so clear when talking about text based generative models that depend on expert input to give relevant responses. About voice LLMs too, any given money would not be enough for a voice actor because doing so would effectively destroy their future jobs and thus future income.
When a human creates art, there is some intent on it, some emotions they felt when they decided the color pallete, the form… The fact that someone created it and that there’s some story behind it gives the piece weight.
Why is an abstract monument created by humans something other humans like to see, and doesn’t happen the same on a landslide? Because there’s a story behind it.
AI art is lifeless because there’s no intent behind it, you don’t appreciate the skill of the author behind it. It’s just prompt mastery and anyone can replicate it, it’s cheap.
It’s like comparing human made sculptures with 3d printed sculptures, if 3d printers could create details and work in big sizes. It’s cheap.