AI agents that navigate codebases just got a meaningful efficiency upgrade. Semble, an open-source code search library from MinishLab, reached 247 upvotes on Hacker News on May 17, 2026, after showing that AI agents can find relevant code with 94% recall using just 2,000 tokens. The alternative -- traditional grep-and-read workflows -- requires roughly 100,000 tokens to achieve 85% recall, a lower quality result at 50 times the cost. The project is available on GitHub under the MIT license and hit 1,455 stars within hours of launch.
What Happened
MinishLab published the Show HN post for Semble on May 17, 2026, introducing a code search tool built specifically for the retrieval patterns AI coding agents use. The library indexes a typical repository in 263 milliseconds and answers queries in 1.5 milliseconds, making codebase exploration fast enough to use inside an agent loop without introducing meaningful latency. The package is available on PyPI and ships with an MCP server that integrates directly into Claude Code.
The release attracted significant attention in the Hacker News discussion because the token reduction claim is verifiable and the tooling is immediately usable. Unlike many AI tooling announcements, Semble ships with benchmarks across 1,250 queries on 63 repositories in 19 programming languages.
Why It Matters

Token costs are the hidden tax on every AI coding workflow. When an agent needs to explore an unfamiliar codebase, the standard approach -- grep for a keyword, then read every matching file -- burns through context window fast. At 100,000 tokens per codebase search, a moderately complex refactoring task can exhaust a 200K context window before the agent has even begun writing code.
For creators building AI-powered tools -- ComfyUI custom nodes, Blender scripts, image generation pipelines, or Claude Code agents -- this token ceiling is a practical limit on what AI can accomplish in a single session. Every token spent reading irrelevant code is a token unavailable for the actual work: understanding intent, generating solutions, and reviewing output.
Semble removes that ceiling. It returns only the code chunks that match the query, keeping context consumption at roughly 2,000 tokens per search regardless of repository size. The result: more tokens for reasoning and code generation, fewer wasted on file scanning.
Key Details

How Semble Works
Semble uses a hybrid retrieval approach that combines two complementary search techniques:
- Semantic search via Model2Vec embeddings using the code-specialized potion-code-16M model, which finds code based on meaning rather than exact keywords
- Lexical search using the BM25 algorithm, which excels at matching specific identifiers and API names like
save_pretrainedortorch.cuda.is_available
Results from both methods are merged using Reciprocal Rank Fusion, then reranked with code-aware signals: definition chunks get priority, test files and legacy code are penalized, and symbol-like queries shift weight toward lexical matching. The Chonkie library handles intelligent code chunking before indexing.
This pipeline achieves something neural-only approaches cannot: near-transformer retrieval quality with indexing that takes milliseconds rather than minutes.
Performance Benchmarks
MinishLab tested Semble against two baselines on 1,250 queries across 63 repositories:
| Method | NDCG@10 | Index Time | Query p50 |
|---|---|---|---|
| Semble | 0.854 | 263 ms | 1.5 ms |
| CodeRankEmbed Hybrid | 0.862 | 57 s | 16 ms |
| BM25 only | 0.673 | 263 ms | 0.02 ms |
Semble reaches 99% of CodeRankEmbed Hybrid retrieval quality while indexing 218 times faster and answering queries 11 times faster. The 0.008 NDCG@10 gap is negligible in practice; the speed and token differences are transformative for any workflow that runs many queries.
Token Efficiency: The Real Number
MinishLab calculated the 98% token reduction as the ratio of snippet characters to full file characters across the benchmark set. At the 94% recall threshold:
- Semble: approximately 2,000 tokens per query
- Grep and full file reads: approximately 100,000 tokens, with lower 85% recall
Semble uses fewer tokens and returns higher quality results. An agent using Semble gets more relevant code in less context, which means more accurate completions and fewer follow-up queries to fill in gaps.
What to Do Next
Add Semble to Claude Code
If you use Claude Code for AI-assisted development, adding Semble takes one command:
claude mcp add semble -s user -- uvx --from "semble[mcp]" semble
Open your project in Claude Code. Semble will index the repository automatically when Claude Code first searches the codebase. Check token savings after a session:
semble savings
Install for Python Workflows
For custom agent pipelines or automated scripts, install via PyPI:
pip install semble
Then query any local codebase:
from semble import SembleIndex
index = SembleIndex.from_path("./my-comfyui-extension")
results = index.search("how does the sampling step work", top_k=5)
for r in results:
print(r.file, r.start_line)
print(r.content[:200])
To index a remote repository without cloning it, pass a GitHub URL to SembleIndex.from_git().
Creator Outcome: What This Enables
For creators using AI coding agents on their tools and pipelines, Semble enables three concrete improvements:
Longer sessions on complex tasks. When code search uses 2,000 tokens instead of 100,000, the remaining context goes toward actual work: understanding intent, writing code, and reviewing changes. Tasks that previously required multiple sessions -- building a full ComfyUI workflow adapter or wiring a multi-step Blender automation -- can complete in one.
Faster iteration on large plugin ecosystems. Navigating a codebase where understanding one component requires tracing five files is where grep approaches collapse. Semble returns the relevant chunks across all files in a single query, giving agents the cross-file context they need without reading everything.
Lower costs on automated workflows. For creators running nightly AI-assisted code reviews, automated documentation generators, or CI-based refactoring pipelines, the 98% token reduction applies to every automated run. A workflow running 50 code searches per day can save tens of millions of tokens monthly.
For more on AI coding agent workflows, see the Claude Code parallel sessions guide on running multiple agents concurrently, or the OpenSquilla agent runtime for token routing strategies that complement Semble.
Frequently Asked Questions

Does Semble work with any programming language?
Yes. MinishLab benchmarked Semble across 63 repositories in 19 programming languages. The tool uses language-agnostic chunking and works with Python, JavaScript, TypeScript, Go, Rust, C++, and other common languages without configuration.
Does Semble send my code to an external service?
No. Semble runs entirely on CPU using locally computed Model2Vec embeddings. No data leaves your machine, and there is no external API dependency. For creators working with proprietary pipelines or client code, the local-only design is a meaningful privacy guarantee.
How does Semble compare to GitHub Copilot code search?
Copilot code intelligence is cloud-based and tied to GitHub. Semble is local, open-source, and model-agnostic -- it connects to any agent via MCP or Python API. It is also significantly more token-efficient because it returns only the relevant snippet rather than broad file context.
Can I use Semble with agents other than Claude Code?
Yes. Semble exposes a Python API and CLI that any agent can call. The MCP integration is a convenience layer for Claude Code specifically. Any agent that supports tool calling or can invoke shell commands can use Semble for code search.
How long does indexing take on a large codebase?
MinishLab reports 263 milliseconds for typical repositories. Exact time scales with repository size, but the hybrid approach is 218 times faster than purely neural alternatives. Even for larger codebases, indexing completes before the first agent response in most sessions.
What is the license?
MIT. Semble is free for personal and commercial use with attribution. The model weights and all dependencies are permissively licensed.