Technology ❯ Artificial Intelligence ❯ Model Evaluation

Performance Benchmarks

Coding Performance Adversarial Testing Security Testing Comparative Analysis Use Cases MMLU Benchmark State-of-the-Art Performance MMLU and MATH-500 GPQA Diamond

OpenAI Releases Open-Weight LLMs With Safety Challenge and Azure AI Foundry Integration

A $500,000 red teaming contest will assess the models’ robustness under Apache 2.0 licensing before widespread rollouts