Humanity's Last Exam is a language model benchmark consisting of 2,500 questions across a broad range of subjects. It was created jointly by the Center for AI Safety and Scale AI. From Wikipedia
Offered under an Apache 2.0 license on platforms such as Hugging Face, gpt-oss models enable text-only reasoning on local hardware at the expense of lower accuracy than OpenAI’s closed o-series.