Anthropic Unveils New AI Model Feature: Claude Capable of Human-like Computer Control

2024-10-23

Recently, Anthropic unveiled two updated AI models, introducing a novel feature that allows its AI assistant, Claude, to operate a computer similarly to a human user. This feature, termed "Computer Interaction," has entered the public testing phase, enabling Claude to perform tasks by viewing the screen, moving the cursor, and typing. This makes Claude the first advanced AI model to incorporate such a capability.

Alex Albert, Anthropic's Director of Developer Relations, stated that the company did not create specialized tools for particular tasks. Instead, Claude was trained with comprehensive computer skills, allowing the AI to naturally utilize the software and tools commonly used in everyday activities.

The "Computer Interaction" feature integrates existing AI visual comprehension and logical reasoning abilities, building upon previous research in multimodal and tool usage models. Claude begins by capturing screenshots of the computer screen, identifying elements present, and calculating actions based on pixel positions. By determining the number of pixels the cursor needs to move both vertically and horizontally, Claude can accurately click and interact effectively. This precision in pixel calculation is essential for ensuring reliable control, akin to how the model addresses text processing challenges.

During training, Claude was also taught to use basic software applications such as calculators and text editors, enabling the transfer of these skills to more complex programs. Although still in its early stages, this feature has demonstrated considerable flexibility and self-correction capabilities, allowing it to autonomously overcome obstacles.

Anthropic shared demonstrations showcasing Claude's operational abilities. For instance, Claude was able to search for relevant information across spreadsheets and customer relationship management (CRM) systems without human assistance and input the data into the required forms to fulfill vendor requests.

In another example, Claude took on programming tasks by creating, modifying, and running a personal homepage with a 1990s aesthetic in both a web browser and an integrated development environment (IDE), all while debugging errors during the process. Despite encountering hurdles such as the absence of Python installation, Claude quickly adapted by switching to Python 3.

Anthropic noted that currently, Claude faces challenges in performing basic operations that humans consider effortless, such as scrolling and dragging. During the recorded demonstrations, Claude even paused unexpectedly and deviated from the task to view photos of Yellowstone National Park.

The announcement also introduced the upgraded Claude 3.5 Sonnet model, which exhibits significant improvements in coding capabilities, achieving a 49% score in the SWE-bench Verified test. This surpasses competitors, including OpenAI's o1-preview. GitLab discovered that this upgrade enhances their software development performance by approximately 10% without increasing latency, representing a substantial advantage for real-time coding tasks.

The newly added Claude 3.5 Haiku model matches the performance of Anthropic's previous top-tier models but offers lower costs and faster speeds. This model is set to be released later this month through Anthropic's API and major cloud service providers.

To ensure safety, Anthropic has implemented multiple measures, including the new system, to detect potential misuse such as spam or fraud related to this feature. Additionally, AI safety research institutes in the United States and the United Kingdom participated in pre-deployment testing of the upgraded model, maintaining the same security standards as previous versions.

Currently, the "Computer Interaction" feature is available for public testing to developers via Anthropic's API and cloud services like Amazon Bedrock and Google Cloud's Vertex AI. Several companies, including Asana, Canva, and DoorDash, are evaluating this technology to execute complex tasks.

Although Claude's computer skills lag behind human capabilities, scoring only 14.9% in industry tests compared to the typical human score of 70-75%, Anthropic anticipates rapid improvements in the coming months. Despite being in the experimental phase, this feature opens new possibilities for AI, enabling it not only to handle text but also to perform digital tasks in the real world.