Measuring Massive Multitask Language Understanding is a popular benchmark for evaluating the capabilities of large language models. It inspired several other versions and spin-offs, such as MMLU-Pro, MMMLU and MMLU-Redux. From Wikipedia
The 70-billion parameter model delivers performance comparable to larger models at a fraction of the cost, with enhanced accessibility and energy efficiency.