NVIDIA Corp. today announced new artificial intelligence and simulation tools to accelerate development of robots including humanoids. Also at the Conference for Robotic Learning, Hugging Face Inc. and NVIDIA said they are combining their open-source AI and robotics efforts to accelerate research and development.
The tools include the generally available NVIDIA Isaac Lab robot learning framework and six new robot learning workflows for the Project GR00T initiative to accelerate humanoid development. They also include new world-model development tools for video data curation and processing, including the NVIDIA Cosmos tokenizer and NVIDIA NeMo Curator for video processing.
Hugging Face said its LeRobot open AI platform combined with NVIDIA AI, Omniverse and Isaac robotics technology will enable advances across industries including manufacturing, healthcare, and logistics.
NVIDIA Isaac Lab to help train humanoids
Isaac Lab is an open-source robot learning framework built on NVIDIA Omniverse, a platform for developing OpenUSD applications for industrial digitalization and physical AI simulation. Developers can use Isaac Lab to train policies at scale for all types of robot movement, from collaborative robots and quadrupeds to humanoids, said NVIDIA.
The company said leading research entities, robotics manufacturers, and application developers around the world are using Isaac Lab. They include 1X, Agility Robotics, The AI Institute, Berkeley Humanoid, Boston Dynamics, Field AI, Fourier, Galbot, Mentee Robotics, Skild AI, Swiss-Mile, Unitree Robotics, and XPENG Robotics.
A guide to migrating from Isaac Gym is available online, and NVIDIA Isaac Lab 1. is available now on GitHub.
Project GR00T offers blueprints for general-purpose robots
Announced at the Graphics Processing Unit Technology Conference (GTC) in March, Project GR00T aims to develop libraries, foundation models, and data pipelines to help the global developer ecosystem for humanoid robots. NVIDIA has added six new workflows coming soon to help robots perceive, move, and interact with people and their environments:
- GR00T-Gen for building generative AI-powered, OpenUSD-based 3D environments
- GR00T-Mimic for robot motion and trajectory generation
- GR00T-Dexterity for robot dexterous manipulation
- GR00T-Control for whole-body control
- GR00T-Mobility for robot locomotion and navigation
- GR00T-Perception for multimodal sensing
“Humanoid robots are the next wave of embodied AI,” said Jim Fan, senior research manager of embodied AI at NVIDIA. “NVIDIA research and engineering teams are collaborating across the company and our developer ecosystem to build Project GR00T to help advance the progress and development of global humanoid robot developers.”
Cosmos tokenizers minimize distortion
As developers build world models, or AI representations of how objects and environments might respond to a robot’s actions, they need thousands of hours of real-world image or video data. NVIDIA said its Cosmos tokenizers provide high quality encoding and decoding to simplify the development of these world models with minimal distortion and temporal instability.
The company said the open-source Cosmos tokenizer runs up to 12x faster than current tokenizers. It is available now on GitHub and Hugging Face. XPENG Robotics, Hillbot, and 1X Technologies are using the tokenizer.
“NVIDIA Cosmos tokenizer achieves really high temporal and spatial compression of our data while still retaining visual fidelity,” said Eric Jang, vice president of AI at 1X Technologies, which has updated the 1X World Model dataset. “This allows us to train world models with long horizon video generation in an even more compute-efficient manner.”
NeMo Curator handles video data
Curating video data poses challenges due to its massive size, requiring scalable pipelines and efficient orchestration for load balancing across GPUs. In addition, models for filtering, captioning and embedding need optimization to maximize throughput, noted NVIDIA.
NeMo Curator streamlines data curation with automatic pipeline orchestration, reducing video processing time. The company said this pipeline enables robot developers to improve their world-model accuracy by processing large-scale text, image and video data.
The system supports linear scaling across multi-node, multi-GPU systems, efficiently handling more than 100 petabytes of data. This can simplify AI development, reduce costs, and accelerate time to market, NVIDIA claimed.
NeMo Curator for video processing will be available at the end of the month.
Hugging Face, NVIDIA share tools for data and simulation
Hugging Face and NVIDIA announced at the Conference for Robotic Learning (CoRL) in Munich, Germany, that they’re collaborating to accelerate open-source robotics research with LeRobot, NVIDIA Isaac Lab, and NVIDIA Jetson. They said their open-source frameworks will enable “the era of physical AI,” in which robots understand their environments and transform industry.
More than 5 million machine-learning researchers use New York-based Hugging Face’s AI platform, which includes APIs with more than 1.5 million models, datasets, and applications. LeRobot offers tools for sharing data collection, model training, and simulation environments, as well as low-cost manipulator kits.
Those tools now work with Isaac Lab on Isaac Sim, enabling robot training by demonstration or trial and error in realistic simulation. The planned collaborative workflow involves collecting data through teleoperation and simulation in Isaac Lab, storing it in the standard LeRobotDataset format.
Data generated using GR00T-Mimic will then be used to train a robot policy with imitation learning, which is subsequently evaluated in simulation. Finally, the validated policy is deployed on real-world robots with NVIDIA Jetson for real-time inference.
Initial steps in this collaboration have shown a physical picking setup with LeRobot software running on NVIDIA Jetson Orin Nano, providing a compact compute platform for deployment.
“Combining Hugging Face open-source community with NVIDIA’s hardware and Isaac Lab simulation has the potential to accelerate innovation in AI for robotics,” said Remi Cadene, principal research scientist at LeRobot.
Also at CoRL, NVIDIA released 23 papers and presented nine workshops related to advances in robot learning. The papers cover integrating vision language models (VLMs) for improved environmental understanding and task execution, temporal robot navigation, developing long-horizon planning strategies for complex multistep tasks, and using human demonstrations for skill acquisition.
Papers for humanoid robot control and synthetic data generation include SkillGen, a system based on synthetic data generation for training robots with minimal human demonstrations, and HOVER, a robot foundation model for controlling humanoid locomotion and manipulation.
The post NVIDIA adds open AI and simulation tools for robot learning, humanoid development appeared first on The Robot Report.