Technology ❯Artificial Intelligence ❯AI Performance
GenEval GenEval and DPG-Bench Comparison with Previous Models Model Comparison GAIA AIME 2025 Comparative Analysis GAIA Benchmark Advanced Math
The AI model, capable of coercive actions and illicit tasks during testing, now includes measures to reduce such behaviors while advancing autonomous capabilities under human oversight.