mistral.rs v0.8.2: 3-5x Faster Local LLM on CUDA
mistral.rs v0.8.2 delivers 3.5-5.5x faster MoE prefill on CUDA, fused decode kernels, and agentic tool-calling improvements for local LLM workflows.
mistral.rs v0.8.2 delivers 3.5-5.5x faster MoE prefill on CUDA, fused decode kernels, and agentic tool-calling improvements for local LLM workflows.
NVIDIA released Nemotron 3 Ultra on June 1 2026: a 550B mixture-of-experts model with 55B active parameters, open weights on Hugging Face, with 5x faster inference and 30% lower cost than Nemotron 2.
StepFun ships Step 3.7 Flash: open-weights 201B MoE VLM with 256K context, native video input, and FP8/NVFP4 builds for local deployment.
ByteDance releases Bernini-R, an open-source video editing model that handles object insertion, removal, and replacement in existing footage using text prompts.
OpenMOSS published the MOSS-Audio technical report on June 1, 2026, documenting four open-source audio-language models that achieve benchmark scores rivaling systems three to four times their size.
PewDiePie open-sourced Odysseus, a self-hosted AI workspace with chat, agents, deep research, and 270+ model serving. MIT license, no telemetry.
NVIDIA Cosmos 3 ranks first among open-source models on the Artificial Analysis Text-to-Image leaderboard. The 64B-parameter model ships with image-to-video capability and open commercial-use weights.
Pallaidium turns Blender's Video Sequence Editor into a complete AI movie studio. The May 31 release adds Blender 5.2 support and a redesigned plugin architecture.
Llama Studio v0.2.0 is a lightweight web interface for managing multiple llama-server sessions, with multi-GPU tensor splitting, shell-script configs, and auto-load snapshots.
A developer used Stability AI Stable Audio 3 Medium to generate 15,834 free audio samples: 10,359 drum one-shots and 5,475 pitched instrument recordings, available for immediate download.
A new open-source ComfyUI custom node called TextMakerPro brings a layer-based text and layout editor into Stable Diffusion workflows, letting creators design stylized text compositions without leaving ComfyUI.
Stable Audio Studio is a new open-source desktop application that runs Stability AI's Stable Audio Open 1.0 model directly on your machine. Released to GitHub in May 2026, the project gives creators a full audio generation environment with no account or subscription.
Baidu open-sourced NAVA, a 6.3B parameter joint audio-video model that generates 720p video with synced dual-channel audio in a single pass.
Liquid AI dropped LFM2.5-8B-A1B on Hugging Face on May 28, the first reasoning-tuned MoE in the LFM2.5 family with 8.3B params, 1.5B active per token, and built-in tool calling.
Shengshu AI released minWM on May 28, an Apache 2.0 framework that converts open video models like Wan2.1 and HunyuanVideo into real-time interactive world models with camera control.
Nvidia is bringing Cosmos, Nemotron, GR00T, and Ising under OpenMDW-1.1. Here is what the unified AI model license means for creative AI developers.
NVIDIA ships an NVFP4 4-bit quantized build of Qwen3.6-35B-A3B, cutting GPU memory 3x with under 1% accuracy loss on eight benchmarks.
The AV1 successor is officially here with up to 40% better compression. Here is what the AV2 1.0 spec means for video creators and AI workflows.
parakeet.cpp ports NVIDIA Parakeet automatic speech recognition models to ggml, eliminating the Python runtime entirely — with byte-identical output to NeMo at up to 1.86x faster throughput.
A DPRK-linked supply chain attack plants a RAT via npm packages and routes all stolen credentials to private HuggingFace datasets. Two AI developer victims confirmed May 28 2026.
Musicians and creators who want AI music generation without monthly fees now have a compelling open-source option. The Muser launched on GitHub on May 27, 2026 as a self-hostable platform that generates complete music tracks locally.
PrismML released Bonsai Image 4B with 1-bit and ternary checkpoints under Apache 2.0. The model retains 95% of FLUX.2 Klein 4B quality at 6.4x smaller size and runs directly on iPhone.
PARE achieves 52% parameter reduction on Wan2.1-14B with just 0.6 points drop on VBench, using spatial-temporal aware pruning and content-adaptive routing. Combined with step distillation, total speedup reaches 50x.
Microsoft open-sourced Lens, a 3.8-billion-parameter text-to-image diffusion model, on May 25, 2026. It rivals FLUX and SD3, runs in diffusers and ComfyUI, under MIT license.