Google's Gemini 2.5 Pro Leads AI "IQ" Race in Coding Rankings and MENSA Tests AI NEWS

Home
AInews
Google's Gemini 2.5 Pro Leads AI "IQ" Race in Coding Rankings and MENSA Tests

Google's Gemini 2.5 Pro Leads AI "IQ" Race in Coding Rankings and MENSA Tests

2025-05-09

Google's latest release, Gemini 2.5 Pro, has claimed the top spot in the WebDev Arena rankings, surpassing competitors like Claude to become the preferred choice for developers seeking exceptional coding capabilities.

This AI model boasts a context window of one million tokens (expandable to two million), enabling it to handle large codebases and complex projects far beyond the capabilities of models like ChatGPT and Claude 3.7 Sonnet, which can process only up to 128K tokens.

Gemini 2.5 Pro also achieved the highest scores in reasoning benchmarks, including the MENSA IQ test and the Human Last Exam, showcasing advanced problem-solving abilities crucial for handling sophisticated development tasks.

Earlier this year, Gemini 2.5 Pro ranked first across multiple categories, including coding, style control, and creative writing. Its massive context window allows processing even the most challenging competitor-unmatched large-scale code repositories and intricate projects.

Gemini stands out as the "smartest" among all AI models. TrackingAI created a standardized method to compare AI models through official MENSA tests using verbal questions from Mensa Norway.

In these evaluations, Gemini 2.5 Pro outperformed competitors even when tested with customized questions not included in its training data.

In offline assessments, the new Gemini scored an IQ of 115, placing it in the "high intelligence" category, while average human intelligence ranges between 85 to 114 points. However, interpreting AI IQ requires nuance – unlike humans, AI systems don't truly possess IQ, making benchmark performances more metaphorical representations of reasoning capabilities.

In AI-specific benchmarks, Gemini 2.5 Pro scored 86.7% in the AIME 2025 mathematics test and 84.0% in the GPQA science assessment. In the Human Last Exam (HLE), a newer and more challenging benchmark designed to avoid test saturation issues, Gemini 2.5 achieved 18.8%, surpassing OpenAI's o3 mini (14%) and Claude 3.7 Sonnet (8.9%), demonstrating significant performance improvements.

The new version of Gemini 2.5 Pro is now freely available (with rate limits) to all Gemini users. Google previously described this version as the "experimental edition of 2.5 Pro," part of its "Thinking Models" family designed to generate responses through reasoning rather than just text generation.

Despite not winning every benchmark test, Gemini has captured developers' attention with its versatility. The model can create complex applications from single prompts, build interactive web applications, endless runner games, and visual simulations without detailed specifications.

We tested the model by asking it to fix broken HTML5 code. It generated nearly 1,000 lines of code, delivering results that surpassed previous leader Claude 3.7 Sonnet in both quality and understanding of complete instructions.

For working developers, Gemini 2.5 Pro costs $2.50 per million input tokens and $15.00 per million output tokens, positioning itself as a more affordable alternative to some competitors while still offering impressive capabilities.

This AI model can handle up to 30,000 lines of code in its advanced plan, making it suitable for enterprise-level projects. Its multimodal capabilities – handling text, code, audio, images, and video – add flexibility unmatched by other coding-focused models.

Vizcom AI

Transform sketches into 3D models and edit them

Keploy

Automated testing made easy with AI technology

Figma Make

Create prototype apps from existing designs

Doctronic

AI platform providing personalized health guidance

3D Look AI

AI body scanner for accurate body measurements

VulnZap

AI code vulnerability scanner

The Furnisher

AI room design tool for quick makeovers

RECENT AI TOOLS

Plaud

Vizcom AI

Keploy

Figma Make

Doctronic

RECENT AI NEWS

Voxtral Transcribe 2 offers speech recognition at $0.003 per minute.

NVIDIA RTX 50 Series Super Refresh Delayed, RTX 60 Series May Miss 2027 Launch

Higgsfield Launches "Atmosphere" Editor to Aid Dynamic Graphics Creation

Meta Tests Its AI-Generated "Vibes" Video Standalone App

OpenAI Launches Frontier Agent Management Platform and New GPT-5.3-Codex Model

Anthropic Releases Claude Opus 4.6 with 1 Million Token Context Support

Amazon to Test AI Tools for Film and TV Production Next Month

Google's Annual Revenue Surpasses $400 Billion for the First Time

RECENT AI TOOLS