Overview
- Neema Raphael said, "We’ve already run out of data," noting that developers are increasingly relying on synthetic outputs or training on other models’ responses.
- He warned about the risk of model collapse, where repeatedly training on AI-generated content degrades accuracy and amplifies errors.
- Goldman Sachs argues that large stores of information behind corporate firewalls remain underused and could provide higher-quality inputs for future systems.
- Raphael said data scarcity should not be a massive constraint if firms clean, normalize, and properly govern their internal datasets.
- The remarks echo broader concerns about a looming peak-data crunch, with Nature forecasting a crisis by 2028 and Ilya Sutskever predicting rapid gains will end, while some reporting suggests companies may shift focus toward more agentic AI.