Arcee AI has released Trinity-Large-Thinking, an open-weight reasoning model under Apache 2.0 that ranks second on PinchBench behind only Claude Opus 4.6. The model costs $0.90 per million output tokens, roughly 96% less than comparable closed alternatives.
What Happened
Trinity-Large-Thinking adds extended reasoning before generating responses, improving multi-turn tool calling, context coherence, and instruction following across long agent sessions. The model served 3.37 trillion tokens during its Preview phase on OpenRouter since January 2026, establishing real-world usage data before this official release.
The model is available through Arcee's API and on Hugging Face under Apache 2.0. The Preview version remains free on OpenRouter with reduced hardware allocation for testing.
Why It Matters
Open-weight reasoning models have been scarce at the top of benchmark leaderboards. Most high-performing reasoning models remain closed-source with premium pricing. Trinity-Large-Thinking reaching the second spot on PinchBench under an open license gives developers a viable alternative for building open-source AI agents without per-token costs spiraling during complex workflows.
The 96% cost reduction matters most for agent workloads where models make dozens of tool calls per task. At $0.90 per million output tokens, teams running continuous agent loops can sustain operations that would be prohibitively expensive with closed models.
Key Details
- Apache 2.0 license for full commercial use
- Ranked #2 on PinchBench, behind Claude Opus 4.6
- $0.90 per million output tokens via Arcee API
- 3.37 trillion tokens served during Preview phase
- Optimized for multi-turn tool calling and long-running agent loops
- Available on Hugging Face and OpenRouter
What to Do Next
Developers building agent workflows can test Trinity-Large-Thinking for free via OpenRouter's Preview tier. For production deployments, the Arcee API offers the full model at $0.90 per million output tokens. The Apache 2.0 license also allows self-hosting for teams that want to run inference on their own hardware.