Google brought its AI Edge stack to macOS on June 3, shipping three Apple Silicon apps in one wave: the AI Edge Gallery for running and prompting models, AI Edge Eloquent for on-device voice dictation, and an OpenAI-compatible local serve mode in LiteRT-LM. The flagship local model is Gemma 4 12B, sized to fit a 16GB MacBook.
What this enables: a fully local Gemma stack on a MacBook
Install AI Edge Gallery from the Mac App Store (macOS 14, Apple M1 or newer), pull the Gemma 4 12B weights from inside the app, and you have a multimodal chat surface that runs without an internet connection. Open a terminal and run litert-lm serve to expose the same model on an OpenAI-compatible http://localhost:8080/v1 endpoint that drops straight into Cursor, Continue, or any client expecting an OpenAI base URL. Run AI Edge Eloquent in the background and dictate code or prose into any text field with no cloud round-trip. Per 9to5Mac, third-party tracker MWM counted 71,783 downloads on launch day.
Why It Matters
Local Gemma was already possible through Ollama and LM Studio, but it required wiring three or four tools to get parity with cloud Gemini: a model runner, a clipboard dictation tool, and an OpenAI shim for IDEs. Google now ships all three from the same vendor, which removes the integration friction that has kept local AI as a hobbyist setup rather than a default workflow. Pair this with our coverage of the Gemma 4 12B encoder-free architecture that powers the release. Kingy AI's breakdown frames the macOS launch as the moment laptop-class local AI became a default rather than a niche.
Key Details
Gemma 4 12B is multimodal (text + image) and tuned for agentic workflows, with tool-call support exposed through the LiteRT-LM serve API. Minimum system: macOS 14, Apple M1 or newer, 16GB of unified memory for the 12B model (smaller Gemma 4 sizes fit on 8GB Macs). AI Edge Eloquent uses an on-device speech model and never sends audio to Google's servers. Coverage from Let's Data Science notes the iOS and Android versions of AI Edge Gallery shipped earlier in 2026; macOS is the third platform in the family and the first with full developer-facing serve mode. Adoption data from SquaredTech tracks the launch-day download spike against prior Edge Gallery releases.
What to Do Next
If you currently route IDE autocomplete through a cloud Gemini or Claude key, switch your Cursor or Continue base URL to http://localhost:8080/v1 and test how Gemma 4 12B holds up on your codebase. Creators using local LLMs alongside captioning tools like Caption Creator with Ollama or LM Studio can keep their existing pipeline and add Google's serve mode as a second backend for A/B comparison. Builders running mistral.rs for local inference on workstations should treat AI Edge as the Mac-specific complement rather than a replacement.