
following the release and open-sourcing of the vla large model xiaomi-robotics-0 in february this year, xiaomi today announced the official launch of the model’s full end-to-end post-training workflow for real-world deployment, bringing this model—which once ranked sixth on hugging face’s global vla model download leaderboard—one step closer to becoming a “plug-and-play” productivity tool.
according to xiaomi, by leveraging a pre-trained foundation model and conducting just 20 hours of task-specific fine-tuning on real hardware, the team enabled the robot to master the highly challenging task of “stowing earbuds into their charging case,” while seamlessly executing multiple stowage operations in succession. what may appear to be a simple task is, in fact, fraught with challenges: the clearance between the earbuds and the storage slot is extremely tight, requiring sub-millimeter spatial perception accuracy for precise alignment; meanwhile, the surface roughness of both the earbuds and the case can be as low as ra 0.03 µm, making them prone to displacement upon contact, which necessitates rapid correction of motion deviations to prevent assembly failure.
by releasing the complete post-training workflow, xiaomi has demonstrated the vla model’s rapid learning capability in precision manipulation tasks. the company states that this means developers and industry users will be able to perform efficient, scenario-specific fine-tuning based on the open-source foundation model, using significantly less data and computational resources. as a result, xiaomi-robotics-0 can accelerate its evolution from a general-purpose pre-trained model into a specialized robotic intelligence agent capable of solving real-world problems.