PrismML released Bonsai Image 4B on May 26, 2026, shipping two extreme-quantization variants of a 4-billion-parameter text-to-image model under Apache 2.0. The 1-bit version uses binary weights at 1.125 effective bits per weight; the ternary version uses {-1, 0, +1} weights at 1.71 bits. PrismML claims it is the first image model in its parameter class to run directly on an iPhone, with checkpoints landing on Hugging Face and a companion iOS app on the App Store.
What This Enables
Drop the ternary checkpoint into a local pipeline and you get FLUX-class quality without sending prompts to a cloud endpoint. The released benchmark numbers retain 95% of FLUX.2 Klein 4B accuracy across GenEval, HPSv3, and DPG-Bench at a 6.4x transformer footprint reduction. The 1-bit variant lands at 88% accuracy and 8.3x reduction, small enough to keep an entire image model inside the unified memory of a recent iPhone or M-series Mac. PrismML ships a reference iOS app, Bonsai Studio, that runs the model entirely on-device. For creators, that means iterating moodboards on the plane, generating storyboard frames at a coffee shop, or building offline image features into client apps without inference billing.
Why It Matters
The on-device image generation race has been bottlenecked by transformer size. SDXL and FLUX checkpoints are too large for the unified memory budgets that Apple Silicon and high-end Snapdragon devices ship with. Bonsai Image 4B targets that exact gap by aggressive weight quantization rather than distillation, which preserves prompt-following behavior better than smaller dense models. If the 95% quality claim holds in independent testing, the open-weights release on Apache 2.0 makes this a candidate base model for any creator app that wants free local inference. It joins a small but growing 2026 cohort of mobile-grade open weights alongside Stable Audio 3 for music.
Key Details
Two checkpoints ship today: a 1-bit variant with FP16 scaling factors and a ternary variant, both at 4B parameters. Apache 2.0 license covers weights and code. PrismML published the whitepaper alongside the release, describing the training procedure and the binary scaling approach. Supported runtimes are Apple Silicon (iPhone, iPad, Mac) and CUDA GPUs, with reference inference code in the Bonsai-image-demo repository. Benchmarks reported in the paper are GenEval, HPSv3, and DPG-Bench, the same suite the FLUX team uses, making head-to-head comparison straightforward. PrismML positions the model as a baseline for further research into ultra-low-bit creative AI rather than a frontier-quality replacement for FLUX or Imagen.
What to Do Next
If you build creator apps, pull the ternary checkpoint from Hugging Face and run it against your existing FLUX prompt set to measure delta. The 95% number is an average across three benchmarks; specific prompt families (faces, text rendering, hands) may regress more. iOS developers can install Bonsai Studio from the App Store to gauge real-device latency before integrating. For studios that already run a FLUX or SDXL pipeline, the most immediate use is offline backup rendering, on-device drafts for mobile clients, and rapid iteration where round-trip cloud latency hurts the creative loop.