On June 3, 2026, NVIDIA detailed physical AI research unveiled at CVPR 2026, including the launch of Cosmos 3, a new open foundation model, alongside a free library of AI agent skills for vision, 3D reconstruction, and robotics simulation.
What Happened
NVIDIA Cosmos 3 is the company's newest open physical AI foundation model, described as "the world's first full omnimodel unifying vision reasoning, world and action generation." Built on a mixture-of-transformers architecture, a reasoning transformer processes visual observations and passes instructions to a generation tower, enabling multi-modal physical world understanding in a single model.
Alongside Cosmos 3, NVIDIA released the Physical AI Skills library, an open-source collection of tools available at github.com/NVIDIA/skills. Key tools include:
- Neural Reconstruction: converts real-world camera footage into editable 3D scenes for simulation and synthetic data generation
- Metropolis Vision Skills: generates synthetic visual scenarios for training inspection and anomaly detection models, including defect examples across different surfaces
- InstantNuRec: fast neural reconstruction for turning multi-view images into editable 3D representations
- Isaac Sim 6.0 Agent Skills: automates robotic simulation scene preparation, policy training, and build-and-evaluate loops
Why It Matters
These tools make enterprise-grade physical AI workflows accessible to any researcher or developer. The NVIDIA Cosmos family, previously used by autonomous vehicle and robotics teams, is now available as an open frontier model on GitHub and through the NVIDIA model catalog.
For AI practitioners working with synthetic visual data, the Metropolis skills lower the cost of generating training datasets. Rare defect cases, unusual lighting conditions, and uncommon object configurations that are expensive to capture in the real world can be synthesized directly from reference images. This cuts a major bottleneck for computer vision model development.
The Neural Reconstruction workflow is particularly notable for 3D content creators: footage of a physical object or environment converts into an editable scene representation, removing a manual modeling step when building simulation environments or synthetic datasets.
Key Details
- Cosmos 3 is free and available through GitHub and the NVIDIA model catalog
- All Physical AI Skills are open-source (MIT/Apache licensed)
- InstantNuRec is available at github.com/NVIDIA/instant-nurec for fast neural reconstruction
- CVPR 2026 is running this week; NVIDIA is also presenting research on GraspGen-X (robotic grasping), LCDrive (autonomous driving), and NitroGen (embodied agents in video games) alongside these tool releases
- Preconfigured Launchable environments are available on NVIDIA Brev for running the skills on H100 GPUs
Creator Outcome: How to Start
The easiest entry point is NVIDIA Brev, where free trial compute credits let you run Physical AI Launchables on H100 GPUs without a local GPU setup. The Physical AI Skills repo on GitHub includes documentation for each tool, with Neural Reconstruction and Metropolis being the most relevant for 3D and visual AI workflows. If you work with synthetic data generation for model training, or want to experiment with converting real footage into editable 3D scenes, this stack is now available at no cost.