PARE: Run Wan2.1-14B Video Models at Half Compute

PARE: Run Wan2.1-14B Video Models at Half Compute

PARE achieves 52% parameter reduction on Wan2.1-14B with just 0.6 points drop on VBench, using spatial-temporal aware pruning and content-adaptive routing. Combined with step distillation, total speedup reaches 50x.

NVIDIA Nemotron Diffusion: 3x Faster LLM Decoding

NVIDIA Nemotron Diffusion: 3x Faster LLM Decoding

NVIDIA released the Nemotron-Labs-Diffusion family on Hugging Face, an open-weights LLM that switches between autoregressive, diffusion, and self-speculation decoding for 2.7x to 3.3x throughput gains.

ByteDance Lance: 3B Open Model for Image and Video

ByteDance Lance: 3B Open Model for Image and Video

ByteDance Research released Lance, a 3B Apache 2.0 unified multimodal model that handles image and video generation, editing, and understanding in a single framework. Strong VBench and GenEval scores.

Zerostack: A Rust Coding Agent With 8MB RAM

Zerostack: A Rust Coding Agent With 8MB RAM

Zerostack is a pure Rust coding agent that launched May 16, 2026, running in 8MB of RAM compared to 300MB for JavaScript-based alternatives like Opencode.

IBM Granite Embedding R2: Open Multilingual RAG Models

IBM Granite Embedding R2: Open Multilingual RAG Models

IBM has released Granite Embedding Multilingual R2: a pair of Apache 2.0 embedding models with 32K context, 200+ languages, and a top MTEB score under 100M parameters. A drop-in swap for paid commercial embedding APIs.

Free Weekly Newsletter

Stay ahead of Creative AI

Join creators getting the latest AI tools, model releases, and workflow tips delivered weekly.

No spam. Unsubscribe anytime.