Particle.news

Download on the App Store

Alibaba Unveils EMO: AI That Animates Photos into Talking, Singing Videos

The new AI system, developed by Alibaba's Institute for Intelligent Computing, generates lifelike videos from a single photo and audio, raising both admiration and ethical concerns.

  • A team at Alibaba's Institute for Intelligent Computing has developed EMO, an AI system that generates realistic videos of people talking or singing from a single photo and audio track.
  • EMO uses a novel audio-to-video synthesis approach without the need for 3D models or facial landmarks, relying on diffusion modeling trained on over 250 hours of audio and video data.
  • The technology can create expressive facial animations and head poses, accurately mimicking human speech and singing expressions.
  • Demonstrations include making historical figures and fictional characters speak or sing, showcasing EMO's ability to produce videos with high realism and expressiveness.
  • Concerns have been raised about the potential misuse of this technology for unethical purposes, such as impersonation or spreading misinformation.
Hero image