Measuring Massive Multitask Language Understanding is a popular benchmark for evaluating the capabilities of large language models. It inspired several other versions and spin-offs, such as MMLU-Pro, MMMLU and MMLU-Redux. From Wikipedia
The open-weight model arrives with runnable code and kernel references under an MIT license to enable independent benchmarking.