Key Highlights
- Google introduced Flex and Priority inference tiers for the Gemini API
- The Flex tier provides 50% cost reduction for background processes tolerant of latency
- Priority tier charges 75–100% premium for mission-critical, real-time applications
- Batch API maintains 50% discount with latency extending to 24 hours
- Caching tier uses token volume and storage time as pricing factors
On April 2, Google introduced updates to its Gemini API pricing structure, establishing five separate service tiers: Standard, Flex, Priority, Batch, and Caching. This expansion provides developers with enhanced flexibility in managing the tradeoffs between cost efficiency, response speed, and system reliability across different application types.
The Flex tier targets background operations where immediate responses aren’t essential. By leveraging underutilized compute resources during off-peak periods, this tier delivers 50% cost reduction compared to standard pricing. Response latency varies between 1 and 15 minutes without guaranteed delivery times. Ideal applications include CRM data synchronization, computational research models, and automated agent-based workflows.
Flex distinguishes itself from the current Batch API through synchronous endpoint architecture. Developers avoid the complexity of managing separate input/output file systems and continuous job status monitoring. This streamlined approach delivers identical cost benefits with reduced implementation overhead.
The Priority tier addresses the opposite end of use case requirements. Priced at 75% to 100% above standard rates, this tier supports time-sensitive, mission-critical operations. Response delivery occurs within millisecond-to-second timeframes.
Google positions Priority for applications like real-time customer service chatbots, transaction fraud monitoring, and content safety filtering systems. When Priority tier usage surpasses allocated quotas, excess requests automatically route to the Standard tier, maintaining service availability rather than generating failures.
Complete Tier Architecture
The Batch API continues operating with 50% standard pricing discount, accommodating latency windows extending up to 24 hours. This tier serves high-volume offline processing scenarios where immediate turnaround holds minimal importance.
The Caching tier employs pricing methodology based on token quantities and content retention duration. Google recommends this tier for conversational agents utilizing extensive system prompts, recurring video content analysis tasks, or database queries spanning large document repositories.
Both Flex and Priority tiers utilize a shared service_tier parameter within API requests. Developers can switch between tiers through straightforward configuration adjustments, with API responses indicating which tier processed each specific request.
Flex tier availability extends to all paid tier users across GenerateContent and Interactions API requests. Priority tier access remains restricted to Tier 2 and Tier 3 paid project accounts using identical endpoints.
Developer Benefits
The consolidated interface represents the primary advancement in this release. Previously, managing concurrent background and interactive workloads demanded separate architectural implementations across synchronous and asynchronous systems. The updated structure allows both workload types to operate through identical synchronous endpoints.
Google positioned this enhancement as an extension of its ongoing commitment to supporting AI agent development, which frequently requires simultaneous handling of non-urgent background tasks alongside time-critical interactive operations.
Gemini API product manager Lucia Loher and engineering lead Hussein Hassan Harrirou announced these changes on April 2, 2026.

