Google Deepmind Enables Gemini 3 Flash to Explore Images Actively Through Code AI NEWS

Home
AInews
Google Deepmind Enables Gemini 3 Flash to Explore Images Actively Through Code

Google Deepmind Enables Gemini 3 Flash to Explore Images Actively Through Code

2026-01-29

Google DeepMind is introducing a new feature called "Agentic Vision" for its Gemini 3 Flash model. The model is no longer limited to passively viewing images; it can now actively investigate them—though not all capabilities function automatically.

Conventional AI models process images in a single pass. If they miss details, they are forced to rely on guesswork. Google DeepMind aims to change this with Agentic Vision. The model can now iteratively zoom, crop, and annotate images by generating and executing Python code.

The system operates on a think-act-observe loop. First, the model analyzes the request and the image, then formulates a plan. Next, it generates and runs Python code—for instance, to crop, rotate, or annotate the image. The results are added to the context window, allowing the model to review the new data before responding. According to Google, code execution leads to a 5% to 10% quality improvement across various visual benchmarks.

However, this concept isn't entirely new; OpenAI has introduced similar functionality with its o3 model.

Blueprint Analysis Startup Reports Accuracy Gains

As a real-world example, Google cites PlanCheckSolver.com, a platform that checks construction blueprints for compliance. The startup reports a 5% increase in accuracy by having Gemini 3 Flash iteratively inspect high-resolution drawings. The model crops areas like roof edges or building sections and analyzes them one by one.

For image annotation, the model can draw bounding boxes and labels directly on the image. Google demonstrated this with finger counting—the model marks each finger with a box and number to prevent miscounts.

For visual math problems, the model can parse tables and run calculations within a Python environment instead of hallucinating results. It can then output the findings as a chart.

Many Features Still Require Explicit Instructions

Google acknowledges that not all features work automatically. While the model can autonomously handle zooming in on details, functions like rotating images or solving visual math problems still require explicit prompts. The company plans to address these limitations in future updates.

Agentic Vision is currently limited to the Flash model. Google states plans to expand to other model sizes and add tools like web search and reverse image search.

Agentic Vision is accessible via the Gemini API in Google AI Studio and Vertex AI. The feature is rolling out to Gemini apps—users can select "Think" from the model dropdown menu. Demo applications and developer documentation are also available.

AirOps

AI workflow platform for efficient content creation

Icon

AI ad creator for quick video and image ads

Raptor AI

AI agents that check code security of your app

Fellow AI

AI meeting notes with seamless workflow integration

Darrow AI

AI tool for uncovering valuable legal cases

Magnifi AI

AI investing platform with chat guidance

Skywork AI

AI-powered research tool for professional content

RECENT AI TOOLS

Aura Build

AirOps

Icon

Raptor AI

Fellow AI

RECENT AI NEWS

Anthropic Raises Additional $30 Billion in Series G, Adds $380 Billion in Value

Google Deepmind Upgrades Gemini 3 Deep Think for Complex Scientific and Engineering Tasks

OpenAI to Retire GPT-4o and Three Legacy Models Tomorrow

ByteDance's Next-Gen AI Model Generates Clips from Text, Images, Audio, and Video

OpenAI Launches New High-Speed Coding Model

Qwen-Image-2.0 Nearly Renders Ancient Chinese Calligraphy and Slides with Perfect Text Accuracy

OpenAI Upgrades Responses API with Features for Long-Running AI Agents

After SpaceX Merger, Two Other xAI Co-founders Also Depart

RECENT AI TOOLS