On May 12, 2026, OpenSquilla released v0.1.0, an Apache 2.0 microkernel runtime that wraps any AI agent in three things creators rarely build themselves: smart model routing, a four-tier memory system, and OS-level sandboxing. The community-built project claims 60 to 80 percent token-cost reductions versus running a single premium model for every request, and ships with a self-hostable gateway and 10+ built-in channel adapters out of the box.

What this enables: shrink an agent's monthly token bill

If you run sustained agent workloads (support bots, content pipelines, research assistants), the practical payoff is that simple turns get routed to cheaper models and reasoning billing is disabled for short queries. The router combines hand-crafted features (length, language, presence of code blocks, keyword signals) with embedding-based semantic features that run locally via ONNX. You point OpenSquilla at your existing Anthropic, OpenAI, or Ollama endpoints, and the gateway decides which to call on a per-turn basis. The setup is three commands: bash install.sh, opensquilla onboard, opensquilla gateway run.

Why it matters

Token costs are the silent killer of agent products. A handful of long sessions on a top-tier reasoning model can blow a startup's API budget for the month, and most projects react by switching wholesale to a cheaper model and accepting worse output. OpenSquilla picks the middle path: keep premium reasoning for the queries that need it, route the easy 70 percent to cheaper tiers, and keep the routing logic local instead of paying a third-party broker. The project is one of several open frameworks rebuilding the layer that LangGraph and proprietary cloud agents currently own.

Key details

The memory architecture is split into working, episodic, semantic, and raw audit tiers, with retrieval combining vector search and BM25 full-text search and periodic consolidation cycles called Memory Dream Consolidation. Sandboxing runs at the syscall layer using Bubblewrap on Linux and Seatbelt on macOS, with three policy tiers that gate tool execution and XML-escape inputs for prompt-injection defense. Requirements: Python 3.12 or later, Apache 2.0 license. TestingCatalog's writeup notes the launch includes a "10M Token Bill Challenge" offering free credits for early adopters.

What to do next

If you operate any agent in production, clone the repo and run the gateway against your current traffic to measure the actual routing savings on your workload. Vendor-published numbers should never be trusted in isolation. If you are still building, OpenSquilla is worth evaluating alongside the closed-source agent platforms before you commit a roadmap to one vendor's pricing curve.