• March 16, 2026
Why Box-Lifting Humanoids Are Hard

At first glance, this problem seems like it should already be solved. Vision models can detect boxes. Speech systems can understand simple commands. Robots can run, jump, dance, and even play the piano in carefully staged demos. So why is pick up that box still such a hard problem for humanoids in factories? The answer is that the hard part is not object detection. The hard part is integrating perception, language, manipulation, balance, safety, and error recovery into one system that works reliably in the real world. In robotics terms, the challenge is not a single model. It is the full stack.