Particle News: Columbia Robot Learns to Lip‑Sync by Watching YouTube and Studying Its Reflection

Overview

Columbia Engineering’s Creative Machines Lab first trained the system with mirror-based vision‑to‑action learning, then used hours of human videos to associate audio with lip shapes.
The study, led by Professor Hod Lipson with PhD researcher Yuhang Hu, reports peer‑reviewed results and a public demo of speaking and singing.
The robot performed a track from an AI‑generated debut album titled “hello world_,” demonstrating synchronized articulation during song.
Researchers note current shortcomings with hard consonants like “B” and lip‑puckering sounds such as “W,” with performance expected to improve through continued training.
The team says pairing the capability with conversational AI such as ChatGPT or Gemini could open uses in entertainment, education, medicine, and elder care.