Mistral launched Search Toolkit on May 28, 2026, an open-source framework that consolidates the three stages every production RAG system needs (ingestion, retrieval, evaluation) into one library with consistent interfaces. The toolkit ships alongside a starter app template backed by Vespa for hybrid retrieval, so teams can move from prototype to working pipeline without choosing five separate tools and gluing them together.

Try It: Spin Up a RAG Pipeline in 10 Minutes

Clone search-starter-app, install dependencies, drop your documents into the input folder, and run the ingestion pipeline. The default config gives you parsing, chunking, embedding generation, BM25 sparse retrieval, dense embedding retrieval, and a hybrid blend out of the box. Quality metrics (recall, precision, MRR, NDCG) run against a test set you define, so you can measure whether a config change actually moved retrieval quality before you ship.

The fastest end-to-end demo is the documented Vespa setup. The official documentation walks through schema configuration, custom ingestion pipelines, and the retrieval API surface, all from the same starter template.

Why It Matters

Building production RAG has been a notorious tax on shipping AI features. Every team rebuilds the same five components (chunker, embedder, sparse index, dense index, eval harness) badly the first time. Mistral packaging the whole stack as open source and shipping it with an evaluator built in shortens the path from "we need RAG" to "we have measurable RAG." That matters for the same reason Mistral Vibe shipping its VS Code extension this week mattered: the company is closing toolchain gaps where users had been stitching together third-party pieces.

For working creators building AI products, the Toolkit is the answer to a common Friday question: how do we get our knowledge base into the agent without rewriting the chunker? You ingest, you index, you evaluate, you ship.

Key Details

Search Toolkit handles BM25 sparse retrieval, dense embedding retrieval, and hybrid configurations out of the box. The evaluation harness includes recall, precision, MRR, and NDCG against custom test sets. Primary use cases the team highlights: enterprise search across document repositories, RAG systems that need to isolate retrieval quality from generation quality, domain-specific retrieval for specialized terminology, and agent-powered systems that need indexed search alongside live data via Mistral's main API docs.

Pricing for the toolkit itself is open source (no fee), with usage costs accruing only to whatever inference and storage providers you bind it to. The starter app uses Vespa by default but the retrieval interface is provider-agnostic. Mistral did not publish a public benchmark suite at launch, so teams will need to evaluate against their own test sets to compare against LlamaIndex, LangChain RAG, or DSPy's retrieval modules.

What to Do Next

If you are building RAG for a creator tool, an agent, or an internal knowledge product, clone the starter template this week and run it against your existing dataset. The win is the eval harness: you can finally measure whether your retrieval is actually better after a config change, instead of guessing from anecdotal queries. Pair it with the rest of Mistral's open-source stack if you want a fully self-hosted pipeline.