Particle.news
Download on the App Store

Columbia Robot Learns to Lip‑Sync by Watching YouTube and Studying Its Reflection

The Science Robotics paper presents a 26‑motor face that maps audio to lifelike mouth motions, signaling an early step toward natural human‑robot communication.

Overview

  • Columbia Engineering’s Creative Machines Lab first trained the system with mirror-based vision‑to‑action learning, then used hours of human videos to associate audio with lip shapes.
  • The study, led by Professor Hod Lipson with PhD researcher Yuhang Hu, reports peer‑reviewed results and a public demo of speaking and singing.
  • The robot performed a track from an AI‑generated debut album titled “hello world_,” demonstrating synchronized articulation during song.
  • Researchers note current shortcomings with hard consonants like “B” and lip‑puckering sounds such as “W,” with performance expected to improve through continued training.
  • The team says pairing the capability with conversational AI such as ChatGPT or Gemini could open uses in entertainment, education, medicine, and elder care.