Google DeepMind is introducing a new feature called "Agentic Vision" for its Gemini 3 Flash model. The model is no longer limited to passively viewing images; it can now actively investigate them—though not all capabilities function automatically.
Conventional AI models process images in a single pass. If they miss details, they are forced to rely on guesswork. Google DeepMind aims to change this with Agentic Vision. The model can now iteratively zoom, crop, and annotate images by generating and executing Python code.
The system operates on a think-act-observe loop. First, the model analyzes the request and the image, then formulates a plan. Next, it generates and runs Python code—for instance, to crop, rotate, or annotate the image. The results are added to the context window, allowing the model to review the new data before responding. According to Google, code execution leads to a 5% to 10% quality improvement across various visual benchmarks.
However, this concept isn't entirely new; OpenAI has introduced similar functionality with its o3 model.
Blueprint Analysis Startup Reports Accuracy Gains
As a real-world example, Google cites PlanCheckSolver.com, a platform that checks construction blueprints for compliance. The startup reports a 5% increase in accuracy by having Gemini 3 Flash iteratively inspect high-resolution drawings. The model crops areas like roof edges or building sections and analyzes them one by one.
For image annotation, the model can draw bounding boxes and labels directly on the image. Google demonstrated this with finger counting—the model marks each finger with a box and number to prevent miscounts.
For visual math problems, the model can parse tables and run calculations within a Python environment instead of hallucinating results. It can then output the findings as a chart.
Many Features Still Require Explicit Instructions
Google acknowledges that not all features work automatically. While the model can autonomously handle zooming in on details, functions like rotating images or solving visual math problems still require explicit prompts. The company plans to address these limitations in future updates.
Agentic Vision is currently limited to the Flash model. Google states plans to expand to other model sizes and add tools like web search and reverse image search.
Agentic Vision is accessible via the Gemini API in Google AI Studio and Vertex AI. The feature is rolling out to Gemini apps—users can select "Think" from the model dropdown menu. Demo applications and developer documentation are also available.