Overview
- The study, presented at the International Conference on Learning Representations, tested over 350 AI models against human performance on interpreting three-second social interaction videos.
- Humans consistently outperformed AI models in understanding social dynamics, with AI struggling to predict human ratings and brain responses accurately.
- Language models performed better than image and video models but relied heavily on descriptive captions, exposing limitations in direct video analysis.
- Researchers attribute AI's shortcomings to its design, inspired by brain areas for static image processing, which lacks mechanisms for dynamic scene comprehension.
- The findings underscore risks for AI in autonomous vehicles and humanoid robots, emphasizing the need for interdisciplinary approaches to enhance social understanding.