Particle News: OpenAI Brings Realtime API to GA and Launches gpt‑realtime With MCP, SIP, and Image Input

Overview

OpenAI made its Realtime API generally available and released gpt‑realtime, a speech‑to‑speech model that can switch languages mid‑sentence and is positioned as its most advanced production voice model.
The API now supports Model Context Protocol for tool access, image inputs for on‑the‑fly visual understanding, and Session Initiation Protocol for direct phone connectivity to contact centers.
OpenAI reports stronger instruction following, expanded function calling, more natural and expressive speech, and recognition of non‑verbal cues, with benchmarks at 82.8% on Big Bench Audio and 30.5% on MultiChallenge.
Two API‑only voices, Cedar and Marin, are available, and pricing for gpt‑realtime is reduced by 20% to $32 per million audio input tokens and $64 per million audio output tokens.
Customer demos highlighted enterprise use cases from T‑Mobile and Zillow, while the offering enters a crowded field that includes ElevenLabs, SoundHound, Hume, Mistral, and Google.