Particle.news

Download on the App Store

OpenAI Brings Realtime API to GA and Launches gpt‑realtime With MCP, SIP, and Image Input

Lower pricing with improved instruction following signals a push to move voice agents from demos to production.

Overview

  • OpenAI made its Realtime API generally available and released gpt‑realtime, a speech‑to‑speech model that can switch languages mid‑sentence and is positioned as its most advanced production voice model.
  • The API now supports Model Context Protocol for tool access, image inputs for on‑the‑fly visual understanding, and Session Initiation Protocol for direct phone connectivity to contact centers.
  • OpenAI reports stronger instruction following, expanded function calling, more natural and expressive speech, and recognition of non‑verbal cues, with benchmarks at 82.8% on Big Bench Audio and 30.5% on MultiChallenge.
  • Two API‑only voices, Cedar and Marin, are available, and pricing for gpt‑realtime is reduced by 20% to $32 per million audio input tokens and $64 per million audio output tokens.
  • Customer demos highlighted enterprise use cases from T‑Mobile and Zillow, while the offering enters a crowded field that includes ElevenLabs, SoundHound, Hume, Mistral, and Google.