DeepSeek Advances Self-Improving AI Models With Tsinghua Collaboration
The Chinese AI startup is developing DeepSeek-GRM, next-generation models designed to enhance reasoning and efficiency through a novel feedback loop mechanism.
- DeepSeek is collaborating with Tsinghua University to develop self-improving AI models using a new reinforcement learning approach called self-principled critique tuning (SPCT).
- The upcoming DeepSeek-GRM models aim to improve reasoning and efficiency by incorporating a feedback loop that rewards better performance.
- Preliminary benchmarks suggest these models could outperform competitors like Google’s Gemini, Meta’s Llama, and OpenAI’s GPT-4 models, though independent verification is limited.
- DeepSeek plans to release DeepSeek-GRM as open-source technology, continuing its strategy of disrupting the AI market by prioritizing accessibility and innovation.
- Experts have raised ethical and technical concerns about self-improving AI, including risks like 'model collapse' and the potential need for safeguards such as a 'kill switch.'