Google launches Gemini 2.5 Flash-Lite and makes 2.5 Flash and 2.5 Pro widely available
Google has announced the full availability of its Gemini 2.5 Flash and Gemini 2.5 Pro AI models for production use, marking a significant milestone in AI development. The latest expansion of the Gemini 2.5 series also includes the preview release of Gemini 2.5 Flash-Lite, positioned as the most cost-effective and fastest model in the lineup. "Our Gemini 2.5 series is designed as a range of hybrid reasoning models that deliver exceptional performance while achieving Pareto-optimal efficiency in both cost and speed," stated the company. "Today's launch of the stable versions of 2.5 Pro and Flash represents our next critical step, complemented by the Flash-Lite preview - our most economical and rapid model to date."
The transition from preview to full deployment of Gemini 2.5 Flash and Pro has benefited from extensive feedback across enterprise and developer communities. Major companies including Snap, SmartBear, Spline, and Rooms have already integrated these models into their application ecosystems. The newly introduced Gemini 2.5 Flash-Lite is specifically optimized for workloads requiring speed and efficiency, currently available in preview mode for developer evaluation. With its architecture prioritizing low-latency processing and reduced computational resource consumption, Flash-Lite becomes ideal for large-scale applications demanding both cost-effectiveness and quick response times.
While maintaining a compact design, Flash-Lite preserves core capabilities of the Gemini 2.5 series including 1M-token context window support for handling extensive documents, dialogues, and codebases. The model integrates with Google's search and code execution tools to process multimodal inputs and deliver precise responses across diverse tasks. All models in the Gemini 2.5 series utilize the Mixture-of-Experts (MoE) architecture, which selectively activates only relevant neural network components based on input prompts, optimizing hardware utilization and reducing inference costs.
These models represent the first generation trained on Google's in-house TPUv5p AI chips, leveraging enhanced software clusters to address training challenges. The production-ready status of Gemini 2.5 Pro and Flash ensures reliable performance for complex tasks like advanced coding, sophisticated reasoning, and multimodal understanding, while differentiating model selection based on performance requirements (Pro for power, Flash for speed, and Flash-Lite for ultimate efficiency).
Developers can access stable versions of Gemini 2.5 Flash and Pro through Google AI Studio, Vertex AI, and the Gemini app. The preview version of Flash-Lite is available via Google AI Studio and Vertex AI, with customized variants already integrated into Google Search to enhance AI-powered search functionalities. The pricing structure for the expanded Gemini 2.5 series reflects varying capabilities and use cases: Flash-Lite offers the most economical entry point at $0.10 per million input tokens (text/images/video) and $0.40 per million output tokens, while Flash is priced at $0.30 per million input tokens and $2.50 per million output tokens.