Particle News: UW Researchers Unveil Spatial Speech Translation for Real-Time Multi-Speaker Translation

Overview

Spatial Speech Translation uses off-the-shelf noise-cancelling headphones and binaural microphones to separate, track, and translate multiple speakers with a 2–4 second delay.
The system preserves each speaker’s unique voice traits and spatial directionality, even as they move through a 360-degree adaptive tracking algorithm.
All processing occurs locally on Apple M2-powered devices, such as laptops and Vision Pro, avoiding cloud computing to safeguard user privacy.
Tested in 10 diverse environments, the system outperformed non-tracking models in user preference, with participants favoring a 3–4 second delay for accuracy.
The proof-of-concept code has been released as open-source, enabling further development and potential expansion to more languages beyond Spanish, German, and French.