LLM - Creative AI News

Deep Dive Jun 9, 2026

Claude Fable 5 vs Opus 4.8: What Creators Get

Anthropic released Claude Fable 5, its most capable model, on June 9, 2026, free on paid plans through June 22. How it compares to Opus 4.8.

AI Models Jun 8, 2026

Xiaomi MiMo Hits 1000 Tokens Per Second on 1T Open Model

Xiaomi MiMo-V2.5-Pro-UltraSpeed claims 1000 tokens per second decode on a 1T MoE. Open-weights FP4 checkpoint plus a 2-week free API trial.

LLM Jun 3, 2026

ICML 2026: AI Models Could Run on 97% Less Memory

New ICML 2026 research shows transformer models can share attention projections, achieving up to 96.9% KV cache reduction with minimal accuracy loss.

AI Jun 1, 2026

Nemotron 3 Ultra: NVIDIA's 550B Open-Weights MoE

NVIDIA released Nemotron 3 Ultra on June 1 2026: a 550B mixture-of-experts model with 55B active parameters, open weights on Hugging Face, with 5x faster inference and 30% lower cost than Nemotron 2.

News May 29, 2026

Tesla V100 in a Gaming PC: 32GB Local LLM for £200

Developer Oscar Molnar installed a secondhand Tesla V100 SXM2 into his gaming PC alongside an RTX 4080, building a 32GB dual-GPU setup for under £200 total.

AI Tools May 24, 2026

George Hotz: AI Agents Are a Slot Machine for Creative Work

George Hotz argues AI agents frontload impressive progress but stall on polish, creating a golden era of AI-generated slop that creators need to plan for.

News May 21, 2026

Qwen3.7-Max vs Claude, Gemini, GPT-5.5: Compared

Qwen3.7-Max sets a new bar for non-hallucination rate among frontier agent models. Here is how it stacks up against Claude Opus 4.7, Gemini 3.1 Pro, and GPT-5.5 across the four reliability dimensions that decide which model goes into production.

News May 21, 2026

Cohere Command A+ Opens 218B Apache 2.0 Frontier Model

Cohere released Command A+ under Apache 2.0 on May 21, 2026. The 218B sparse MoE runs on two H100s, with native citations and 48 languages.

News May 19, 2026

NVIDIA Nemotron Diffusion: 3x Faster LLM Decoding

NVIDIA released the Nemotron-Labs-Diffusion family on Hugging Face, an open-weights LLM that switches between autoregressive, diffusion, and self-speculation decoding for 2.7x to 3.3x throughput gains.

News May 19, 2026

Gemini 3.5 Flash: $1.50 Coding Model Beats 3.1 Pro

Google launched Gemini 3.5 Flash at I/O 2026, a Flash-tier model that beats 3.1 Pro on coding and agent benchmarks at 40 percent lower cost.

News May 18, 2026

Qwen 3.7 Max and Plus Preview Land in Arena Top 15

Alibaba pushed Qwen 3.7 Max Preview and Qwen 3.7 Plus Preview to Arena and Qwen Chat for testing. Max sits 13th overall on Arena Text and Plus is 16th on Vision.

Deep Dive May 15, 2026

DeepSeek-V4-Flash Makes LLM Steering Practical

DeepSeek-V4-Flash is the first local model competitive with frontier AI, making LLM activation steering practical for the first time. A guide for creators.

AI Tools May 15, 2026

RelaxAI: UK-Hosted LLM API 80% Cheaper Than OpenAI

Civo Limited’s RelaxAI offers UK-sovereign LLM inference at £0.10 per million input tokens, with ISO 27001 certification and 100% UK data residency.

Deep Dive May 14, 2026

OpenAI vs. Apple: The ChatGPT-Siri Deal Is Falling Apart

OpenAI enlisted outside legal counsel to explore action against Apple after the ChatGPT-Siri integration failed to generate expected subscription revenue.

AI Tools May 14, 2026

OpenAI Codex Now in ChatGPT iOS: Monitor Projects Anywhere

OpenAI Codex is now in the ChatGPT iOS app, letting creators monitor projects, review diffs, and push code changes without being at a desk.

ai-research May 13, 2026

Recursive Superintelligence: $650M for Self-Improving AI

Richard Socher's stealth-mode AI lab Recursive exits with $650M at a $4.65B valuation, backed by GV, Greycroft, Nvidia, and AMD.

AI Tools May 12, 2026

Anthropic Now Has More Business Customers Than OpenAI

For the first time, more businesses pay for Anthropic than OpenAI, according to May 2026 spending data from Ramp showing 34.4% vs 32.3% business adoption.

mistral May 12, 2026

Mistral Python Package Backdoor: Check Your SDK Now

mistralai==2.4.6 was backdoored in the Mini Shai-Hulud supply chain attack. PyPI quarantined the project. Here is what to check and do if you build with Mistral's Python SDK.

Deep Dive May 11, 2026

Thinking Machines TML-Interaction: Full-Duplex Voice AI

Mira Murati's Thinking Machines shipped TML-Interaction-Small, a 276B full-duplex voice model that listens and speaks simultaneously, beating GPT-Realtime on latency and interaction quality.

Deep Dive May 6, 2026

Claude Agents Gain Self-Learning Memory and Parallel Teams

Anthropic adds four capabilities to Claude Managed Agents: dreaming (self-improving memory), outcomes (quality grading), multiagent orchestration, and webhooks. Harvey saw a 6x completion improvement; Wisedocs cut review cycles by 50%.

Deep Dive May 5, 2026

SubQ Launches With 12M Context and 300x Lower Cost

Subquadratic SubQ launched May 5 with a 12M token context window and an OpenAI-compatible API. Headline claim: 300x lower cost than Claude Opus on RULER 128K.

News May 5, 2026

GPT-5.5 Instant: ChatGPT's New Default Cuts Hallucinations

OpenAI replaced GPT-5.3 Instant with GPT-5.5 Instant as ChatGPT's default model, citing 52% fewer hallucinated claims and a new memory source viewer.

Apple May 4, 2026

iOS 27 Brings AI Model Choice to Apple Intelligence

iOS 27 Extensions lets you swap in Claude, Gemini, ChatGPT, or Grok for Apple Intelligence tasks (Writing Tools, Image Playground, Siri). Comparison + setup workflow + pricing math for working creators using iPhone, iPad, and Mac in 2026.

Deep Dive Apr 30, 2026

Anthropic's $50B Round Targets a Compute Ceiling

Anthropic is closing a $50 billion round at an $850 to $900 billion valuation, the largest private AI raise in history and the company's final financing before a late-2026 IPO.

Stay ahead of Creative AI