Microsoft Introduces Compact AI Models with Multimodal Capabilities
The new Phi-4 models deliver high performance in text, speech, and vision tasks while requiring less computational power.
- Microsoft's Phi-4-Multimodal and Phi-4-Mini models are designed to process text, images, and speech efficiently, with parameter sizes of 5.6 billion and 3.8 billion, respectively.
- Phi-4-Multimodal integrates speech, vision, and text inputs using a novel 'mixture of LoRAs' technique, maintaining accuracy across modalities without performance degradation.
- Phi-4-Mini excels in text-based tasks, outperforming larger models in benchmarks for math, coding, and reasoning, including an 88.6% score on the GSM-8K math benchmark.
- Both models are optimized for deployment on standard hardware and edge devices, offering cost savings, reduced latency, and enhanced data privacy.
- The Phi-4 models are available through Azure AI Foundry, Hugging Face, and Nvidia API Catalog, enabling developers to create innovative applications across industries.