Particle.news

Download on the App Store

LMU–BR Study Finds AI Speech Models Struggle With Bavarian Dialects

Researchers attribute the performance gap to scarce dialect training data, urging larger corpora to enable future subtitling.

Overview

  • The joint project by LMU’s Center for Information and Language Processing and Bayerischer Rundfunk evaluated BR’s Betthupferl bedtime stories recorded across Bavaria’s major dialect groups.
  • Three speech-recognition model families were tested, including OpenAI’s Whisper, to compare performance on dialect recordings with Standard German samples.
  • Error rates on dialect speech were markedly higher than on Standard German, with BR reporting that sentence meaning was often lost in transcription.
  • Whisper handled Swiss German relatively well in the tests, yet it did not resolve the significant errors on Bavarian dialects.
  • The validated findings were presented this week at the Interspeech conference in Rotterdam, and the teams propose expanding dialectal datasets to improve accuracy and support automated subtitling of dialect broadcasts.