Recently, OpenAI unveiled a "research preview" of an AI agent named Operator. This agent is capable of "browsing the web and executing tasks on behalf of users". According to OpenAI, Operator can navigate the internet using its built-in browser capabilities and interact with websites through typing, clicking, and scrolling. This service was initially rolled out in the United States for ChatGPT Pro subscribers paying $200 per month.
Operator employs a "computer usage agent" model that integrates GPT-4's visual abilities with advanced reasoning skills achieved through reinforcement learning, enabling it to interact with graphical user interfaces (GUIs). OpenAI explains that Operator can "see" (via screenshots) and "interact" with browsers (performing all mouse and keyboard actions), allowing it to execute operations on web pages without requiring custom API integrations.
Equipped with reasoning capabilities for "self-correction", Operator transfers control back to the user when faced with unsolvable issues. When websites require sensitive information input (such as login credentials), Operator requests user intervention; for actions like sending emails, it "should" seek user approval. OpenAI also states that Operator is designed to "reject harmful requests and block inappropriate content".
OpenAI is collaborating with companies including DoorDash, Instacart, OpenTable, Priceline, StubHub, Thumbtack, and Uber to ensure Operator meets real-world needs while adhering to established guidelines. However, the company warns that the tool currently faces challenges handling "complex interfaces (like creating slides or managing calendars)", potentially failing to meet full expectations.
In the future, OpenAI plans to extend Operator to Plus, Team, and Enterprise users, integrating these features into ChatGPT.