Overview
- The peer-reviewed paper was selected for AAAI 2026, which accepted 4,167 of 23,680 submissions for a 17.6% rate.
- FastDriveVLA reduces visual tokens from 3,249 to 812, delivering a nearly 7.5-fold cut in computational load while maintaining planning accuracy.
- The approach uses adversarial foreground–background reconstruction to retain critical cues such as lanes, vehicles, and pedestrians.
- Reported performance is on the nuScenes benchmark and is presented as enabling more feasible real-time inference for end-to-end VLA driving models.
- The acceptance follows XPENG’s CVPR WAD appearance and its VLA 2.0 reveal, as the company cites progress toward future Level 4 autonomy.