Overview
- MAI-DxO was tested on 304 complex case studies from the New England Journal of Medicine using the Sequential Diagnosis Benchmark.
- The system achieved 80 percent diagnostic accuracy compared with 20 percent for human physicians under the same test conditions.
- It reduced diagnostic costs by roughly 20 percent by ordering fewer and less expensive tests.
- A multi-agent orchestrator queries leading AI models—including OpenAI’s GPT, Google’s Gemini, Anthropic’s Claude, Meta’s Llama and xAI’s Grok—to mimic collaborative clinical reasoning.
- Microsoft is planning external clinical trials to verify real-world performance and is evaluating potential biases ahead of any consumer integration.