Because LLM needs human-produced material to work with. If the incentive to produce such material drops, generative models will start producing garbage.
It has already started to be a problem with the current LLMs that have exhausted most easily reached sources of content on the internet and are now feeding off LLM-generated content, which has resulted in a sharp drop in quality.
“It has already started to be a problem with the current LLMs that have exhausted most easily reached sources of content on the internet and are now feeding off LLM-generated content, which has resulted in a sharp drop in quality.”
Do you have any sources to back that claim? LLMs are rising in quality, not dropping, afaik.
It’s still being researched but there are papers that show that, mathematically, generative models cannot feed on their own output. If you see an increase in quality it’s usually because their developers have added a new trove of human-generated data.
In simple terms, these models need two things to be able to generate useful output: they need external guidance about which input is good and which is bad (throughout the process), and they need both types of input to reach a certain critical mass.
Since the reliability of these models is never 100%, with every input-output cycle the quality drops.
If the model input is very well curated and restricted to known good sources it can continue to improve (and by improve I mean asymptotically approach a value which is never 100% but high enough, like over 90%). But if models are allowed to feed on generative output (being thrown back at them by social bots and website generators) their quality will take a dive.
I want to point out that this is not an AI issue. Humans don’t have a 100% correct output either, and we have the exact same problem – feeding on our own online garbage. For us the trouble started showing much slower, over the last couple of decades or so, as talk about “fake news”, misinformation being weaponized etc.
AI merely accelerated the process, it hit the limits of reliability much sooner. We will need to solve this issue either way, and we would have needed to solve it even if AI weren’t a thing. In a way the appearance of AI helped us because it forces us to deal with the issue of information reliability sooner rather than later.
I wouldn’t be concerned about that, the mathematical models make assumptions that don’t hold in the real world. There’s still plenty of guidance in the loop from things such as humans up/downvoting, and people generating several to many pictures before selecting the best one to post. There’s also as you say lots of places with strong human curation, such as wikipedia or official documentation for various tools. There’s also the option of running better models as the tech progresses against old datasets.
Because LLM needs human-produced material to work with. If the incentive to produce such material drops, generative models will start producing garbage.
It has already started to be a problem with the current LLMs that have exhausted most easily reached sources of content on the internet and are now feeding off LLM-generated content, which has resulted in a sharp drop in quality.
“It has already started to be a problem with the current LLMs that have exhausted most easily reached sources of content on the internet and are now feeding off LLM-generated content, which has resulted in a sharp drop in quality.”
Do you have any sources to back that claim? LLMs are rising in quality, not dropping, afaik.
It’s still being researched but there are papers that show that, mathematically, generative models cannot feed on their own output. If you see an increase in quality it’s usually because their developers have added a new trove of human-generated data.
In simple terms, these models need two things to be able to generate useful output: they need external guidance about which input is good and which is bad (throughout the process), and they need both types of input to reach a certain critical mass.
Since the reliability of these models is never 100%, with every input-output cycle the quality drops.
If the model input is very well curated and restricted to known good sources it can continue to improve (and by improve I mean asymptotically approach a value which is never 100% but high enough, like over 90%). But if models are allowed to feed on generative output (being thrown back at them by social bots and website generators) their quality will take a dive.
I want to point out that this is not an AI issue. Humans don’t have a 100% correct output either, and we have the exact same problem – feeding on our own online garbage. For us the trouble started showing much slower, over the last couple of decades or so, as talk about “fake news”, misinformation being weaponized etc.
AI merely accelerated the process, it hit the limits of reliability much sooner. We will need to solve this issue either way, and we would have needed to solve it even if AI weren’t a thing. In a way the appearance of AI helped us because it forces us to deal with the issue of information reliability sooner rather than later.
I wouldn’t be concerned about that, the mathematical models make assumptions that don’t hold in the real world. There’s still plenty of guidance in the loop from things such as humans up/downvoting, and people generating several to many pictures before selecting the best one to post. There’s also as you say lots of places with strong human curation, such as wikipedia or official documentation for various tools. There’s also the option of running better models as the tech progresses against old datasets.