OpenAI’s early disclosure of its gold-level score has underscored gaps in verification protocols for AI performance in strict exam settings.