Xiaomi cut MiMo-v2.5 API pricing by up to 99% effective May 27, 2026, and dropped the input-length tier structure entirely. The same announcement reset all existing token plan quotas and expanded usage volume on existing plans by 5 to 8 times. The MiMo-v2.5 series spans text chat, image understanding, audio understanding, video understanding, and a TTS variant, all exposed through Xiaomi's MiMo platform with OpenAI and Anthropic API-compatible endpoints.
What This Enables
If your creative pipeline already calls OpenAI or Anthropic SDKs, you can drop the MiMo base URL into your existing client and route through Xiaomi for the steps where you do not need frontier reasoning. The compatibility layer covers chat completions, multimodal inputs, and tool use, which means a director scouting reference images can hit MiMo-v2.5 vision for cheap captioning passes, a podcaster can run audio transcription through the audio-understanding variant before paying for Whisper-large, and a video editor can offload scene detection to MiMo before any frontier-class summarization. The 99% reduction puts ceiling-class multimodal inference within the budget of unfunded creator projects.
Why It Matters
This is the second permanent price-floor reset in May 2026, following the DeepSeek V4-Pro 75% permanent cut on May 23. Two cuts in five days from major Chinese labs signal that frontier-comparable multimodal inference is no longer a premium-tier capability. Western labs that price above this floor on commodity tasks (captioning, transcription, image understanding) will face pressure on the long tail of API consumption even if their reasoning models stay distinctly priced. For creators, the practical effect is that multimodal preprocessing steps move from cost-constrained to effectively free, which changes which workflows you can afford to build agents around.
Key Details
The cut covers the full MiMo-v2.5 series: chat, image understanding, audio understanding, video understanding, and MiMo-v2.5-TTS for speech synthesis. The announcement removed the input-length-based pricing differential, so long-context calls now pay the same per-token rate as short ones. Token plan users had quotas fully reset on May 27. Web search tool calls are bundled into the chat API on the same compatibility layer. Xiaomi positions the change as enabling broader access to "better models" rather than a competitive response to a specific Western lab, but the timing alongside rising HBM costs squeezing every other lab's margins reads as a deliberate floor reset.
What to Do Next
Audit your current OpenAI and Anthropic spend by call type. Anywhere you run captioning, transcription, simple image understanding, or video scene tagging at high volume, swap one workflow to MiMo-v2.5 and compare quality on your specific prompt set. The Chinese-language strength of MiMo means quality holds best on multilingual content, including subtitle generation and cross-language reference matching. Keep frontier reasoning calls on your existing provider while routing commodity multimodal preprocessing through the cheaper path.