In a recent pitch to investors, Anthropic outlined plans to develop AI-powered virtual assistants capable of performing tasks like research, answering emails, and handling various back-office jobs autonomously. Described as a “next-gen algorithm for AI self-teaching,” the company envisions its AI playing a key role in automating large sectors of the economy. While this vision has taken time to materialize, Anthropic has now introduced an upgraded version of its Claude 3.5 Sonnet model, which is beginning to fulfill these promises.
The new Claude 3.5 Sonnet model, released by Anthropic, can now interact with any desktop application via the “Computer Use” API, currently available in open beta. This API allows the model to imitate human inputs, such as keystrokes, mouse gestures, and button clicks, simulating a user operating a computer. Anthropic explained that Claude can analyze screenshots and calculate how to navigate interfaces by determining the necessary cursor movements to perform tasks. This new capability is available through Anthropic’s API, Amazon Bedrock, and Google Cloud’s Vertex AI platform.
Despite this advancement, the concept of automating desktop tasks is not entirely new. Various companies, including older RPA vendors and newer startups, have developed similar tools. AI agents, though still a somewhat loosely defined term, generally refer to AI systems capable of automating software tasks. Major tech companies like Salesforce, Microsoft, and OpenAI are also investing heavily in AI agents, aiming to monetize the growing AI market. Anthropic distinguishes its approach by focusing on creating an “action-execution layer” that enables desktop-level command execution, setting it apart from its competitors.
However, the new Claude 3.5 Sonnet model is not without limitations. In tests involving tasks like booking flights or initiating returns, the model succeeded in less than half of the cases and struggled with basic actions like scrolling and zooming. Anthropic acknowledges these shortcomings and advises developers to start with low-risk tasks when experimenting with the model. Despite these issues, the company emphasizes that releasing the model now allows them to gather valuable data and improve it over time.
Anthropic has also implemented safety measures to mitigate risks, such as preventing the model from being trained on user data and taking steps to avoid high-risk actions like interacting with government websites. While the company is focused on preventing misuse, including potential election-related interference, they have acknowledged that the model is not foolproof. Developers using the Computer Use API are encouraged to take precautions, such as isolating the model from sensitive data, to prevent unintended consequences.