DeepMind Claims Its Latest AI Tool Excels in Math and Science Problems AI NEWS

Home
AInews
DeepMind Claims Its Latest AI Tool Excels in Math and Science Problems

DeepMind Claims Its Latest AI Tool Excels in Math and Science Problems

2025-05-14

DeepMind, the AI research lab under Google, has announced the development of an innovative AI system capable of addressing problems with "machine-scorable" solutions.

In experimental trials, this system—named AlphaEvolve—has been shown to assist in optimizing some of Google’s infrastructure used for training its AI models. DeepMind is currently developing a user interface to facilitate interaction with AlphaEvolve and plans to initiate an early access program for select academics before rolling it out more broadly.

A common challenge with most AI models is their tendency to hallucinate. Due to their probabilistic nature, they sometimes confidently generate false or fabricated content. Notably, newer AI models like OpenAI's o3 exhibit more hallucinations than their predecessors, underscoring the complexity of this issue.

AlphaEvolve introduces a clever mechanism to reduce hallucinations: an automated evaluation system. This system generates potential answers, critiques them, and compiles a pool of possible solutions, automatically assessing and scoring each answer based on accuracy.

AlphaEvolve is not the first system to employ such an approach. Researchers, including a team from DeepMind a few years ago, have applied similar techniques across various mathematical domains. However, DeepMind asserts that AlphaEvolve leverages cutting-edge models, particularly the Gemini model, making it more capable than earlier iterations.

To use AlphaEvolve, users must provide the system with a problem prompt, optionally supplemented by descriptions, equations, code snippets, and relevant literature. Additionally, they need to supply an automatic evaluation mechanism in the form of a formula to assess the generated solutions.

Because AlphaEvolve can only address problems it can self-evaluate, its scope is limited to certain types of questions, especially those within computer science and systems optimization. A significant limitation is that the system expresses solutions algorithmically, making it unsuitable for non-numerical problems.

To benchmark AlphaEvolve, DeepMind tested the system on a set of approximately 50 math problems spanning branches like geometry and combinatorics. According to DeepMind, AlphaEvolve successfully "rediscovered" the best-known solutions in 75% of cases and found improved solutions in 20% of instances.

DeepMind also evaluated AlphaEvolve’s performance on practical challenges, such as improving efficiency at Google data centers and accelerating model training runs. The lab reports that AlphaEvolve devised an algorithm that consistently reclaimed an average of 0.7% of Google’s global computing resources. Additionally, the system proposed an optimization strategy that reduced the total time required to train Google’s Gemini model by 1%.

It’s important to note that AlphaEvolve did not achieve any groundbreaking discoveries. In one experiment, the system identified a method to improve the design of Google’s TPU AI accelerator chips—a solution that had already been flagged by other tools.

Nevertheless, like many AI labs, DeepMind argues that AlphaEvolve offers significant utility: it can save time while allowing experts to focus on higher-priority tasks.

Harness AI

AI-powered DevOps automation for faster code delivery

Tad AI

AI music generator for custom royalty-free tracks

HiPeople

AI platform for efficient and unbiased hiring

Thea Study

AI study tool for personalized learning experiences

21st

AI tool for instant UI component creation

Firecrawl

Extract clean web data for AI models

11X

AI tool for automating outbound sales prospecting

RECENT AI TOOLS

Dexter

Harness AI

Tad AI

HiPeople

Thea Study

RECENT AI NEWS

OpenAI Releases GPT-5.2 with Cutting-Edge Mathematical Capabilities

Disney Partners with OpenAI to Allow Sora to Generate AI Videos Featuring Its Characters

Runway Launches Its First World Model and Adds Native Audio to Its Latest Video Model

Google Launches “Disco”: A Gemini-Powered Tool That Turns Browser Tabs into Web Apps

Google AI Try-On: Snap a Selfie to Try Clothes

1X Reaches Agreement to Bring “Home” Humanoid Robots into Factories and Warehouses

Google Adds New Features to Boost Website Visibility in AI Search

Google Launches Sub-$5 AI Plus Plan in India to Compete with ChatGPT Go

RECENT AI TOOLS