Kiwi-Edit: Open-Source AI Video Editing Tool

Kiwi-Edit, a new open-source video editing framework from NUS ShowLab, launched on March 5, 2026, combining text instruction guidance with reference image control to handle both global and local video edits at 720p resolution. Built on Qwen2.5-VL-3B and Wan2.2-TI2V-5B, the MIT-licensed model scores 3.02 on OpenVE-Bench, the highest among open-source video editing methods.

What Happened

Researchers at the National University of Singapore's Show Lab released Kiwi-Edit, a unified framework for instruction-guided and reference-guided video editing. Unlike models that rely solely on text prompts, Kiwi-Edit lets users supply a reference image alongside natural language instructions to guide the visual output. The full release includes all datasets, model weights, training code, and a HuggingFace demo for immediate testing.

Why It Matters

Text-only video editing hits a wall when you need a specific visual style or object appearance that words cannot precisely describe. Kiwi-Edit solves this by accepting reference images as a second input channel. Need a character wearing a specific outfit or a scene in a particular art style? Provide a reference image and the model handles the translation. This dual-guidance approach is a meaningful step forward for creative workflows where precision matters more than convenience. The MIT license means developers can integrate it into commercial products without restriction.

Key Details

Architecture: Qwen2.5-VL-3B vision-language model for semantic understanding paired with Wan2.2-TI2V-5B video diffusion transformer for generation
Training data: 477,000 high-quality quadruplets (source video, instruction, reference image, edited video)
Benchmark: 3.02 overall on OpenVE-Bench (evaluated by Gemini-2.5-Pro), highest among open-source methods across five editing categories
Global edits: Style transfers including cartoon, sketch, watercolor, and other visual aesthetics
Local edits: Object removal, object addition, object replacement, and background swaps
Resolution: 720p output quality
License: MIT (fully permissive for commercial use)

What to Do Next

The complete codebase is available on GitHub with setup instructions and a demo script. The project page includes video examples showing each editing category in action. For a deeper understanding of the architecture and training pipeline, the full research paper on arXiv covers the three-stage training strategy and ablation studies. Video creators working with AI editing tools should evaluate Kiwi-Edit against their current pipeline, particularly for tasks where reference image guidance could replace lengthy prompt engineering. The MIT license and available training code also make it a strong foundation for fine-tuning on domain-specific editing tasks.

Kiwi-Edit: Open-Source AI Video Editing with Text and Reference Images

What Happened

Why It Matters

Key Details

What to Do Next

Keep reading

How Creators Actually Use AI: Workflow Analysis for 2026

OpenAI Acquires Astral to Boost Codex Python Tools

Xiaomi MiMo-V2 Ships Multimodal and TTS Models

What Happened

Why It Matters

Key Details

What to Do Next

Stay ahead of AI

Keep reading

How Creators Actually Use AI: Workflow Analysis for 2026

OpenAI Acquires Astral to Boost Codex Python Tools

Xiaomi MiMo-V2 Ships Multimodal and TTS Models

Stay ahead of Creative AI