Overview
- Project Ire uses large language models and a tool-use API to autonomously reverse-engineer and classify software binaries without prior context, creating a transparent chain of evidence for expert audit.
- In evaluations on a public Windows driver dataset, the prototype achieved 98% precision with a 2% false positive rate, while on a tougher real-world set it reached 89% precision with a 4% false positive rate but only 26% recall.
- Despite strong precision, Microsoft acknowledged the agent’s modest recall and is working to improve its ability to detect a higher share of malware samples.
- Microsoft plans to deploy Project Ire as a Binary Analyzer within Defender to enable first-encounter malware classification at greater speed and scale.
- The long-term goal is for the AI agent to autonomously detect novel malware directly in memory, accelerating threat response as attackers leverage AI.