NVIDIA has open-sourced Kimodo (Kinematic Motion Diffusion), the largest controllable motion diffusion model ever trained. Released on March 28, 2026, Kimodo generates high-quality 3D human and robot motions from text prompts, trained on over 700 hours of professional motion capture data from the Bones Rigplay dataset. The model, code, and weights are all freely available under the NVIDIA Open Model License on GitHub, with a live demo on HuggingFace Spaces.

For the broader landscape, see our open-source AI models 2026 creator reference.

What Happened

NVIDIA Research released Kimodo, a two-stage transformer denoiser that generates realistic 3D motion sequences from natural language descriptions. The model supports multiple skeleton formats including SOMA (parametric human body), SMPL-X, and the Unitree G1 humanoid robot skeleton, making it useful for both animation and robotics applications.

Kimodo was trained on 25 times more motion capture data than any prior model, drawing from 700+ hours of production-quality recordings with corresponding text descriptions. The architecture separates root and body motion prediction to minimize artifacts, and accepts text embeddings alongside kinematic constraints for precise control.

Key capabilities include text-to-motion generation for diverse behaviors (locomotion, dancing, gestures, object interaction), full-body keyframe constraints, end-effector positioning for hands and feet, root constraints via 2D waypoints, and the ability to combine multiple constraint types simultaneously. Generated motions are compatible with the ProtoMotions framework and MuJoCo for physics-based policy training.

Why It Matters

Motion generation has been a persistent bottleneck in game development, film production, and robotics research. Professional motion capture sessions cost thousands of dollars per hour and require specialized equipment. Kimodo changes the economics by letting creators generate production-quality motions from simple text descriptions.

The open-source release is significant. Most competitive motion generation models remain locked behind commercial APIs or restrictive licenses. By releasing model weights, training code, and a free demo, NVIDIA is lowering the barrier for indie developers, researchers, and small studios working on 3D and spatial computing projects.

The multi-skeleton support also signals a broader trend: motion AI models are becoming general-purpose controllers, not just animation tools. Generating motions for the Unitree G1 robot directly from the same model that handles human animation suggests a future where a single motion model serves both digital characters and physical robots.

What to Do Next

Try the free HuggingFace demo to test text-to-motion generation without any setup. For integration into production pipelines, clone the GitHub repository and follow the installation guide. The model runs on a single GPU, so most modern workstations can handle it locally. If you are building 3D characters or working in robotics, start experimenting with the constraint system to see how keyframe and end-effector controls can fit your workflow.