Liquid AI has released LFM2.5-350M, a 350-million-parameter open-weight language model that runs AI agents on devices as constrained as smartphones. The model requires as little as 81MB on mobile GPUs and 169MB on NPUs, making it one of the smallest production-ready agent models available.
What Happened
LFM2.5-350M is the newest and smallest addition to Liquid AI's LFM2.5 family, which launched in January with 1.2B+ parameter models. The 350M variant was trained on 28 trillion tokens with scaled reinforcement learning, giving it strong instruction-following capabilities (IFEval score: 76.96) despite its compact size.
The model is optimized specifically for data extraction, structured outputs, and tool use on edge devices. It supports a 32k context window and processes up to 40,400 output tokens per second on a single H100. All weights are available on HuggingFace under open-weight licensing.
Why It Matters
Running AI agents locally eliminates API costs, latency, and privacy concerns. At 81MB, LFM2.5-350M fits on hardware that cannot run even the smallest Llama or Mistral models. For creators building AI-powered tools, apps, or workflows, this opens agent capabilities on phones, tablets, and lightweight laptops without cloud dependencies.
The model joins a growing wave of small models designed for edge deployment, alongside Qwen 3.5 Small and SD3.5 Flash for on-device image generation. The trend is clear: capable AI is moving from data centers to personal devices.
Key Details
- 350M parameters, trained on 28T tokens with reinforcement learning
- 81MB on mobile GPUs, 169MB on NPUs via specialized inference engines
- 32k context window for processing longer documents and conversations
- Supports llama.cpp, MLX, and vLLM inference frameworks out of the box
- Optimized for tool use and structured outputs, not general knowledge tasks
- Open weights available on HuggingFace today
What to Do Next
Developers building mobile or embedded AI applications can download the model from HuggingFace and run it via llama.cpp or MLX. The model works best for agentic tasks like API calling, form filling, and data extraction rather than knowledge-heavy conversations or code generation. For creative workflows that need local AI processing without cloud costs, LFM2.5-350M is worth testing as a lightweight agent backbone.