Google's latest release, Gemini 2.5 Pro, has claimed the top spot in the WebDev Arena rankings, surpassing competitors like Claude to become the preferred choice for developers seeking exceptional coding capabilities.
This AI model boasts a context window of one million tokens (expandable to two million), enabling it to handle large codebases and complex projects far beyond the capabilities of models like ChatGPT and Claude 3.7 Sonnet, which can process only up to 128K tokens.
Gemini 2.5 Pro also achieved the highest scores in reasoning benchmarks, including the MENSA IQ test and the Human Last Exam, showcasing advanced problem-solving abilities crucial for handling sophisticated development tasks.
Earlier this year, Gemini 2.5 Pro ranked first across multiple categories, including coding, style control, and creative writing. Its massive context window allows processing even the most challenging competitor-unmatched large-scale code repositories and intricate projects.
Gemini stands out as the "smartest" among all AI models. TrackingAI created a standardized method to compare AI models through official MENSA tests using verbal questions from Mensa Norway.
In these evaluations, Gemini 2.5 Pro outperformed competitors even when tested with customized questions not included in its training data.
In offline assessments, the new Gemini scored an IQ of 115, placing it in the "high intelligence" category, while average human intelligence ranges between 85 to 114 points. However, interpreting AI IQ requires nuance – unlike humans, AI systems don't truly possess IQ, making benchmark performances more metaphorical representations of reasoning capabilities.
In AI-specific benchmarks, Gemini 2.5 Pro scored 86.7% in the AIME 2025 mathematics test and 84.0% in the GPQA science assessment. In the Human Last Exam (HLE), a newer and more challenging benchmark designed to avoid test saturation issues, Gemini 2.5 achieved 18.8%, surpassing OpenAI's o3 mini (14%) and Claude 3.7 Sonnet (8.9%), demonstrating significant performance improvements.
The new version of Gemini 2.5 Pro is now freely available (with rate limits) to all Gemini users. Google previously described this version as the "experimental edition of 2.5 Pro," part of its "Thinking Models" family designed to generate responses through reasoning rather than just text generation.
Despite not winning every benchmark test, Gemini has captured developers' attention with its versatility. The model can create complex applications from single prompts, build interactive web applications, endless runner games, and visual simulations without detailed specifications.
We tested the model by asking it to fix broken HTML5 code. It generated nearly 1,000 lines of code, delivering results that surpassed previous leader Claude 3.7 Sonnet in both quality and understanding of complete instructions.
For working developers, Gemini 2.5 Pro costs $2.50 per million input tokens and $15.00 per million output tokens, positioning itself as a more affordable alternative to some competitors while still offering impressive capabilities.
This AI model can handle up to 30,000 lines of code in its advanced plan, making it suitable for enterprise-level projects. Its multimodal capabilities – handling text, code, audio, images, and video – add flexibility unmatched by other coding-focused models.