Google Introduces 'Implicit Caching' to Reduce Costs for Accessing Its Latest AI Models AI NEWS

Home
AInews
Google Introduces 'Implicit Caching' to Reduce Costs for Accessing Its Latest AI Models

Google Introduces 'Implicit Caching' to Reduce Costs for Accessing Its Latest AI Models

2025-05-09

Google has introduced a new feature in its Gemini API, claiming it will reduce the cost for third-party developers using its latest AI models.

The feature, called "implicit caching," is said to cut costs by 75% for "repeated context" passed to the model via the Gemini API. It supports Google's Gemini 2.5 Pro and 2.5 Flash models.

This could be good news for developers, as the cost of using cutting-edge models has been rising.

Caching is a widely adopted practice in the AI industry, reducing computational demands and costs by reusing frequently accessed or precomputed data within models. For instance, caching can store answers to questions users frequently ask the model, eliminating the need for regenerating responses to the same queries.

Previously, Google offered model prompt caching but limited it to explicit prompt caching, meaning developers had to define their most frequently used prompts. While cost savings were guaranteed, explicit prompt caching often required significant manual effort.

Some developers were dissatisfied with how Google implemented explicit caching on Gemini 2.5 Pro, stating it could lead to unexpected high API bills. Last week, complaints peaked, prompting the Gemini team to apologize and commit to making changes.

In contrast to explicit caching, implicit caching is automatic. It is enabled by default for Gemini 2.5 models, and if a Gemini API request hits the cache, it will pass on the cost savings.

"When you send a request to one of the Gemini 2.5 models, if the request shares a common prefix with a previous request, then it qualifies to hit the cache," Google explained in a blog post. "We will dynamically pass the cost savings back to you."

According to Google's developer documentation, the minimum number of prompt tokens for implicit caching in 2.5 Flash is 1,024, and 2,048 for 2.5 Pro, which isn't particularly large. This means triggering these automatic savings doesn't require much effort. Tokens are the raw bits of data that models process, with a thousand tokens equivalent to about 750 words. Given previous issues with Google's claims about caching cost reductions, there are some considerations buyers should keep in mind. First, Google advises developers to retain repeated context at the beginning of requests to maximize the chances of an implicit cache hit. Context that may vary from request to request should be appended at the end, the company stated.

Moreover, Google hasn’t provided any third-party verification to confirm that the new implicit caching system will deliver the promised automatic savings. Thus, we’ll need to wait and see feedback from early adopters.

Picmaker

Create social media content and publish it to multiple channels

Rosebud AI

Create a game using chat prompts

Mivi AI Buds

Multilingual AI earbuds with humanlike assistant

CodeRabbit

AI code review tool providing smart feedback

Napkin AI

AI tool converting text into stunning visuals

Google AI Studio

Use the latest AI models by Google for free

Nano Banana

Generate diverse AI product images effortlessly

RECENT AI TOOLS

Legora

Picmaker

Rosebud AI

Mivi AI Buds

CodeRabbit

RECENT AI NEWS

SpaceX Could Secure $2 Billion Deal for Trump's "Golden Dome" Defense Project

Cloudflare Launches Data Platform with No Egress Fees

OpenAI Introduces Paid Option to Create Videos Beyond Sora's Daily Free Limit

Google Removes Gemma from AI Studio Following Senator Blackburn's Allegations of Defamation

Former OpenAI Researchers Launch Applied Compute with $80M in Funding

Universal Music Partners with AI Startup Udio After Resolving Copyright Lawsuit

Pinterest's New AI Shopping Assistant Helps You Find the Perfect Picks

OpenAI Launches Aardvark, an Autonomous GPT-5 Agent for Hunting Software Vulnerabilities

RECENT AI TOOLS