Particle.news

Download on the App Store

Frontier AI Models Clear Mock CFA Level III, Raising Practical Questions for Finance

The results highlight chain-of-thought prompting as the key to essay reasoning.

Overview

  • NYU Stern and GoodFin evaluated 23 AI systems on a mock CFA Level III exam that combines multiple-choice and essay questions.
  • OpenAI’s o4-mini scored 79.1% and Google’s Gemini 2.5 scored about 77%, both above the 63% pass threshold, with Anthropic’s Claude Opus also passing.
  • Models clustered around 71%–75% on multiple-choice items, but essay scores varied widely, distinguishing reasoning-focused systems.
  • Researchers used chain-of-thought prompting to elicit step-by-step explanations, enabling some models to finish the exam in minutes.
  • Industry voices cautioned that exam success does not equal client-ready judgment, urging hybrid use with human oversight as the February human pass rate stood at 49%.