Qwen 3.7 Plus: Vision and Video Come to Bailian

Alibaba's Qwen team launched Qwen3.7-Plus on the Bailian platform on June 2, 2026, the multimodal sibling to last month's Qwen3.7-Max. The model adds native image and video understanding to the Qwen3.7 reasoning stack, with deep reasoning, tool invocation, self-programming, verification, testing, and autonomous iteration available through a single API endpoint.

What Creators Can Try Today

Qwen3.7-Plus is accessible through Alibaba Cloud's Bailian, rebranded internationally as Model Studio. The fastest evaluation path: open the playground, paste a long PDF or a video frame sequence, and ask the model to extract structured data while citing source frames. The same prompt flow used for closed-source vision models like GPT-5.5-V or Claude Opus 4.8 transfers without modification, which makes side-by-side benchmarking on your own evaluation set straightforward.

Why It Matters

The Qwen line has been the de facto open-source frontier for non-US labs across 2026, with the open-weights Qwen3.6 family powering production deployments at every layer from single-GPU local inference to 1T-parameter cluster setups. Qwen3.7-Plus is closed-weights on Bailian today, but the historical pattern is that smaller open-weights variants follow the closed flagship within four to eight weeks. Watch the Qwen organization on Hugging Face for the open-weights releases that traditionally follow each Bailian flagship.

What the Vision and Video Stack Adds

The release extends Qwen into the agent-model category that Qwen Chat users and developers have been asking for: native vision input, video frame understanding, tool calls, and self-verification loops in one model rather than three. For document-heavy workflows, that means one API call replaces a chain of OCR plus extraction plus verification. For video workflows, it means a single model can read a frame sequence, reason about what is happening, and return structured output that drives downstream automation.

Key Details

The model adds vision and video understanding on top of the Qwen3.7-Max reasoning core released in May. Core capabilities listed at launch: image understanding, video understanding, deep reasoning, tool invocation, verification, testing, self-programming, and autonomous iteration. Availability is via the Bailian Model Studio API only at launch, with the same pricing structure as the Qwen3.7-Max tier. International developers access through Alibaba Cloud's English-language Model Studio dashboard. Coverage at MarkTechPost on the Qwen3.7-Max release documents the underlying reasoning stack Plus inherits.

What to Do Next

If your workflow involves long-form document parsing, video frame extraction, or any agentic pipeline that mixes vision and tool calls, run Qwen3.7-Plus against your evaluation suite this week. The cost-per-output-token math is the deciding factor for production: Qwen has consistently priced 30 to 60 percent under the equivalent closed-source US tier, and the same pattern likely holds here.

Qwen3.7-Plus Adds Vision and Video to Reasoning Stack

What Creators Can Try Today

Why It Matters

What the Vision and Video Stack Adds

Key Details

What to Do Next

Keep reading

NVIDIA Cosmos Coalition: Runway, BFL, LTX Join

OpenRouter Adds Voice, Model Fusion, and 20 New Models

JetBrains Mellum2 Thinking: Apache 2.0 MoE Coding Model

What Creators Can Try Today

Why It Matters

What the Vision and Video Stack Adds

Key Details

What to Do Next

Stay ahead of AI

Keep reading

NVIDIA Cosmos Coalition: Runway, BFL, LTX Join

OpenRouter Adds Voice, Model Fusion, and 20 New Models

JetBrains Mellum2 Thinking: Apache 2.0 MoE Coding Model

Stay ahead of Creative AI