llama.cpp Adds Audio Support With Qwen3-Omni
llama.cpp release b8769 adds audio multimodal support for Qwen3-Omni and Qwen3-ASR models, bringing local speech recognition and audio understanding to consumer hardware.
Deep dives, tutorials, and analysis for AI-powered creators.
llama.cpp release b8769 adds audio multimodal support for Qwen3-Omni and Qwen3-ASR models, bringing local speech recognition and audio understanding to consumer hardware.
OpenBMB released VoxCPM2, a 2 billion parameter text-to-speech model that runs on 8GB VRAM, supports 30 languages at 48kHz, and can design voices from natural language descriptions.
Anthropic launched Claude for Word in public beta, bringing AI-powered document editing with native tracked changes to Microsoft Word on Mac and Windows.
MiniMax released Music 2.6 on April 10, adding a Cover feature that extracts a song's melodic skeleton and lets creators rebuild everything around it.
MiniMax released an official open-source CLI tool that brings AI video, music, image, and speech generation to the terminal with seven core capabilities.
Game studios reorganizing around AI are completing prototypes 4x faster and generating UI assets up to 20x faster, according to new Wharton research.
llama.cpp b8738 adds backend-agnostic tensor parallelism that enables large AI models to run across multiple GPUs without vendor-specific code.
Researchers released GaussiAnimate, a framework that automatically rigs and animates 3D Gaussian Splatting assets with 17.3% quality improvement.
Overworld AI released Waypoint-1.5, a real-time video world model that generates interactive environments at 720p and 60 FPS on consumer GPUs like the RTX 3090.
Researchers from Tongji University, Tencent, and five other institutions released MegaStyle, a 1.4-million image dataset for style transfer alongside a FLUX-based model.
ElevenLabs has announced on-premise and on-device deployment options for its voice AI platform, letting organizations run text-to-speech inference entirely within their own infrastructure.
A team of 23 researchers released LPM 1.0, a 17-billion parameter Diffusion Transformer that generates real-time character video from audio input.
PrismAudio, developed by Alibaba FunAudioLLM team, generates spatial stereo audio directly from silent AI video files in an average of 0.63 seconds.
OpenAI launched a new $100/month ChatGPT Pro tier on April 9, delivering 5x more Codex usage than Plus and filling the gap between the $20 and $200 plans.
Adobe has launched two new AI image editing features in Firefly: Precision Flow generates multiple variations from a single prompt, while AI Markup lets creators draw on images to guide AI edits.
Suno licensing negotiations with Universal Music Group and Sony Music have stalled over a core disagreement: whether users can download and share AI-generated songs outside the platform.
Meta launched Muse Spark on April 8, the first AI model built by its new Superintelligence Labs division. Unlike every Llama release before it, Muse Spark is proprietary.
PixVerse released C1, a new AI video generation model focused on cinematic production quality, on April 7. The model adds motion control, style transfer, and cinematic presets.
Cursor's AI code review tool Bugbot now resolves 78% of the bugs it flags, a 26-point jump from its 52% rate at launch in July 2025.
Canva has acquired Simtheory, an AI workspace for building custom agents, and Ortto, a marketing automation platform with over 11,000 customers across 190 countries.