OpenAI Launches ChatGPT Agent to Automate Multi-Step Browser Tasks
OpenAI unveils next-generation ChatGPT Agent capable of browser-based task automation
OpenAI has launched a groundbreaking artificial intelligence agent today that enables complex task execution within web browsers. Powered by an enhanced reasoning AI model, this new system demonstrates superior performance across multiple benchmark tests compared to previous iterations.
The ChatGPT Agent is designed to automate cross-application workflows in cloud environments. Developers can now delegate tasks such as code file retrieval from GitHub and automatic storage in Google Drive folders. The system also incorporates automated vulnerability scanning capabilities for security checks before file storage.
This innovative solution utilizes dual browser interfaces for online interactions. The first browser specializes in text-based operations supporting "simplified reasoning queries," while the second enables visual interface navigation through GUI elements in a manner similar to human user interactions.
Security protocols have been implemented with user safety in mind. For sensitive operations like purchases, the system requires explicit user authorization before execution. OpenAI recommends continuous user supervision during task execution and provides built-in control mechanisms allowing users to terminate tasks, complete them manually, or update instructions.
Beyond standard browser interactions, the tool can access terminal programs to interface with operating systems via scripting. This enables file editing operations and other system-level tasks through command execution.
"As demonstrated in our blog post, the model can choose between text-based and visual browsers for page navigation, download files from the web, manipulate files through terminal commands, and review results using the visual browser interface," noted OpenAI researchers.
Developed with a new AI architecture, the agent outperforms both o4-mini and o3 models in specific reasoning tasks. Internal testing revealed the agent achieved 27.4% accuracy on the challenging FrontierMath benchmark, surpassing 19.3% for o4-mini and 10.3% for o4.
In spreadsheet analysis evaluations using the SpreadsheetBench benchmark, the agent demonstrated 25% better performance than Microsoft's Excel-integrated Copilot solution.
A comprehensive security framework has been implemented to prevent malicious use, with particular focus on detecting hidden malicious prompts in web pages. "We've trained and tested the agent to identify and resist prompt injections, complemented by monitoring systems for rapid detection and response to injection attacks," explained OpenAI engineers.
The ChatGPT Agent is now available for Pro, Plus, and Team tier subscribers.