Particle News: DeepSeek Advances Self-Improving AI Models With Tsinghua Collaboration

Overview

DeepSeek is collaborating with Tsinghua University to develop self-improving AI models using a new reinforcement learning approach called self-principled critique tuning (SPCT).
The upcoming DeepSeek-GRM models aim to improve reasoning and efficiency by incorporating a feedback loop that rewards better performance.
Preliminary benchmarks suggest these models could outperform competitors like Google’s Gemini, Meta’s Llama, and OpenAI’s GPT-4 models, though independent verification is limited.
DeepSeek plans to release DeepSeek-GRM as open-source technology, continuing its strategy of disrupting the AI market by prioritizing accessibility and innovation.
Experts have raised ethical and technical concerns about self-improving AI, including risks like 'model collapse' and the potential need for safeguards such as a 'kill switch.'