Google walked back its compute-based Gemini usage limits on May 28, 2026, just nine days after introducing them at I/O 2026. The biggest change for creators: 3.1 Flash-Lite prompts are now completely free and no longer count against any quota. Failed requests also stop draining credits, single 3.1 Pro prompts get a per-prompt cap, and Google AI Ultra subscribers received double the Gemini Omni video generations after the company fixed a quota bug. 9to5Google reported the rollback the same day Google VP Josh Woodward posted the update.
How to integrate this
If you batch-prompt Gemini for short tasks (captions, alt text, metadata fields, single-image edits, quick rewrites), switch them to Flash-Lite now and stop watching the quota. The model is unmetered for Google AI Pro and Ultra subscribers across the Gemini app. Reserve 3.1 Pro for prompts that genuinely need the deeper model, since each Pro prompt now has a ceiling on how much quota it can consume in one shot, which means a single complex request can no longer drain your daily allowance. Check the live counter at gemini.google.com/usage before kicking off any long batch run.
Why it matters
The compute-based system Google rolled out at I/O 2026 charged quota by request complexity rather than request count, and creators with large files or chained prompts hit caps within hours. Android Authority captured Woodward's acknowledgement that "users were encountering limits sooner than they should." For high-volume creator workflows, the rollback restores predictability: you can plan a 200-prompt rewrite pass on Flash-Lite, know it will not cost a single credit, and keep Pro quota for the handful of prompts that need it.
Key details
Six changes are live or rolling out now:
- 3.1 Flash-Lite prompts are free across the Gemini app for all paid tiers.
- Failed requests no longer count against quota: Google's system mistakes are refunded.
- 3.1 Pro prompts get a per-prompt quota cap so one prompt cannot drain a day.
- Google AI Ultra subscribers now have 2x Omni video generations after a quota bug fix.
- Model selection persists across sessions, so you stop re-picking Flash-Lite every chat.
- Pay-as-you-go top-up credits are confirmed coming but not live yet, per Heise.
This builds on Google's May 25 Gemini 3.5 Flash Low token-reduction work in Antigravity, where the lighter model already cut token use 45% on agent runs. Flash-Lite being free in the consumer app extends that efficiency push directly to creators.
What to do next
Open the Gemini app, switch your default model to 3.1 Flash-Lite for routine work, and check the usage dashboard once a week. If you hit a Pro cap on a complex prompt, split the file or chain into smaller prompts rather than retrying the same one. Ultra subscribers should run their planned Omni video batch now while the doubled allowance is live.