Alibaba Unveils EMO: AI That Animates Photos into Talking, Singing Videos

The new AI system, developed by Alibaba's Institute for Intelligent Computing, generates lifelike videos from a single photo and audio, raising both admiration and ethical concerns.

Overview

A team at Alibaba's Institute for Intelligent Computing has developed EMO, an AI system that generates realistic videos of people talking or singing from a single photo and audio track.
EMO uses a novel audio-to-video synthesis approach without the need for 3D models or facial landmarks, relying on diffusion modeling trained on over 250 hours of audio and video data.
The technology can create expressive facial animations and head poses, accurately mimicking human speech and singing expressions.
Demonstrations include making historical figures and fictional characters speak or sing, showcasing EMO's ability to produce videos with high realism and expressiveness.
Concerns have been raised about the potential misuse of this technology for unethical purposes, such as impersonation or spreading misinformation.

Particle.news

Alibaba Unveils EMO: AI That Animates Photos into Talking, Singing Videos

Overview