Overview
- An NYU Stern and GoodFin study evaluated 23 large language models on a mock CFA Level III exam and reported several passes completed in minutes.
- OpenAI’s o4-mini led with an overall score of 79.1%, while Google’s Gemini 2.5 Flash scored 77.3%, clearing an estimated 63% passing threshold.
- The essay section created the biggest spread in performance, suggesting frontier reasoning models outperform peers on complex, scenario-based tasks.
- The tests used chain-of-thought prompting to elicit stepwise reasoning, a setup that researchers and practitioners say does not mirror real client work.
- GoodFin said it contributed to but did not fund the research, and reactions ranged from LinkedIn praise and career advice to r/CFA skepticism about open-prompt conditions.