Generosity AI

← See all

Problem Library

LLM Self-Pollution

The Internet is increasingly filled by content generated by LLM’s. This may create unpredictable results in future generations of LLM’s, as they begin to consume their own outputs as part of their training dataset. How do we detect and filter out AI-generated content from the dataset? Or alternatively, can we directly study the effects of AI-produced text on AI training efficacy?


Research Question
How can we ectly study the effects of AI-produced text on AI training efficacy?

Goals
Legal and Regulatory Compliance