Everyday Robotics at X — Thomas Herring

During the summer of 2019, I received an internship at X Inc., Google’s research and development subsidiary. While there, I worked on their Everyday Robot project, which aimed to build a fleet of mobile manipulator robots capable of operating in unstructured environments and performing human tasks. I worked closely with hardware, software, and machine learning engineers to accomplish these goals.

When I joined, the Everyday Robot team was focused on iterating and improving the design of their robots. I assisted in the software design and calibration of the perception system, facilitated machine learning experiments for training manipulation policies, and applied my expertise in motion planning to develop experiments to compare various manipulation pipelines.

I can personally attest to the amount of engineering effort it takes to implement a functioning and versatile pipeline based on methods from algorithmic robotics (like task and motion planning). To avoid this, the Everyday Robot team opted to place much of the project’s future on developing a basic visuomotor policy for manipulation. They trained this model using the existing fleet to sort trash into recycling, compose, or trash bins. Nowadays, many methods can achieve much more complex tasks in unstructured environments (see [1], [2], [3], [4], [5], [6], [7]). These new works use modern ML architectures like diffusion models and transformers, which fundamentally differ from the work done in 2019.

I have also noted that the robots from Everyday Robot have been used in several works incorporating large language models (LLMs) with vision policies to solve long-horizon tasks in unstructured environments [8], [9]. I’m excited to see what X, Google Brain, and Google DeepMind will get up to in the future using the robots developed through the Everyday Robot project!

I loved working with the team at X: they brought a vibrant energy that fueled my passion for innovation. I made some good friends whom I still keep up with!

[1] D. Ghosh et al., “Octo: An Open-Source Generalist Robot Policy”.

[2] T. Z. Zhao, V. Kumar, S. Levine, and C. Finn, “Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware.” arXiv, Apr. 23, 2023. doi: 10.48550/arXiv.2304.13705.

[3] Z. Fu, T. Z. Zhao, and C. Finn, “Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation.” arXiv, Jan. 04, 2024. doi: 10.48550/arXiv.2401.02117.

[4] C. Chi et al., “Diffusion Policy: Visuomotor Policy Learning via Action Diffusion.” arXiv, Jun. 01, 2023. Accessed: Jan. 17, 2024. [Online]. Available: http://arxiv.org/abs/2303.04137

[5] J. Pari, N. M. Shafiullah, S. P. Arunachalam, and L. Pinto, “The Surprising Effectiveness of Representation Learning for Visual Imitation.” arXiv, Dec. 06, 2021. doi: 10.48550/arXiv.2112.01511.

[6] D. Shah, A. Sridhar, A. Bhorkar, N. Hirose, and S. Levine, “GNM: A General Navigation Model to Drive Any Robot.” arXiv, May 22, 2023. doi: 10.48550/arXiv.2210.03370.

[7] J. Yang, D. Sadigh, and C. Finn, “Polybot: Training One Policy Across Robots While Embracing Variability.” arXiv, Jul. 07, 2023. doi: 10.48550/arXiv.2307.03719.

[8] M. Ahn et al., “Do As I Can, Not As I Say: Grounding Language in Robotic Affordances.” arXiv, Aug. 16, 2022. doi: 10.48550/arXiv.2204.01691.

[9] T. Xiao et al., “Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models.” arXiv, Jul. 01, 2023. doi: 10.48550/arXiv.2211.11736.

EVERYDAY ROBOTICS AT GOOGLE X

THOMAS HERRING

TH3rRING@GMAIL.COM

832-291-7499