IBM has released Granite Embedding Multilingual R2, a pair of Apache 2.0 embedding models that handle 200+ languages, 32,768-token context, and code retrieval across nine programming languages. The 97M-parameter variant posts the highest MTEB Multilingual Retrieval score of any open embedding model under 100M parameters (60.3, a 9.4-point jump over multilingual-e5-small). Both models shipped on Hugging Face on May 14, 2026.

How to integrate this in your RAG pipeline

If you run a retrieval-augmented generation stack on OpenAI's text-embedding-3-small or multilingual-e5-small, Granite R2 is a drop-in swap that costs nothing per query and runs locally. Install sentence-transformers, load ibm-granite/granite-embedding-97m-multilingual-r2, and re-embed your corpus. The model card includes a three-line example, and the project ships a live demo Space for testing English, German, Japanese, and other queries before you commit. The 97M model runs at roughly 2,500 documents per second on a single H100, fast enough to re-index a sizable knowledge base in minutes.

Why It Matters

Embedding models are the unglamorous infrastructure under every RAG system, search experience, and "chat with your docs" feature. The leading commercial options charge per token and route through US-hosted APIs, which is a problem for creators handling non-English content or sovereign-data customers. Granite R2 closes the gap: Apache 2.0, runs anywhere, and the 311M variant ranks number one on the LongEmbed benchmark for documents longer than 4K tokens, which is the exact range where most enterprise PDFs live. The release lands three weeks after IBM shipped Granite 4.1, the company's 512K-context open LLM, giving teams an end-to-end open stack for retrieval-then-generation pipelines.

Key Details

The release ships two sizes: a 97M-parameter model with 384-dimensional embeddings and a 311M-parameter model with 768-dimensional embeddings. Both share the same 32,768-token context window and the same 200+ language coverage, with 52 languages receiving enhanced retrieval training (including Arabic, Bengali, Chinese, French, German, Hindi, Japanese, Korean, Spanish, Turkish, and Vietnamese). Nine programming languages are explicitly trained for code retrieval: Python, Go, Java, JavaScript, PHP, Ruby, SQL, C, and C++.

IBM trained on its own GneissWeb corpus and skipped the MS-MARCO dataset, which makes the model easier to clear for commercial deployment per IBM's Granite governance review. The technical report is on arXiv.

What to Do Next

Pick one place in your stack where you currently call a paid embedding API. Re-embed the same corpus with the 97M Granite model and run your existing test queries against both indexes. If retrieval quality holds, you have removed an API line item and unlocked offline deployment in one swap.