TL;DR
Anthropic's computer use feature lets Claude see your screen, move the cursor, click, and type. Here is how it works, when to use it, and how to set it up.
Claude can control a computer the way you do. It takes screenshots to see what is on screen, moves the mouse, clicks buttons, and types text. No API integration required. If it is visible on the desktop, Claude can interact with it.
Anthropic released this as a beta feature, initially with Claude 3.5 Sonnet. It has since expanded to Claude Opus 4.5, Opus 4.6, Sonnet 4.6, and Haiku 4.5. On WebArena - a benchmark for autonomous web navigation across real websites - Claude achieves state-of-the-art results among single-agent systems.
This is not browser automation in the Playwright or Selenium sense. Those tools operate in headless environments with no visual context. Computer use gives Claude eyes on the actual display and hands on the actual input devices.
The computer use tool provides four capabilities:
The flow is simple. You send a message to the API with the computer use tool enabled. Claude decides it needs to see the screen, requests a screenshot, analyzes the image, then returns an action like "click at coordinates (450, 320)" or "type 'hello world'". Your application executes that action, takes a new screenshot, and sends it back. The loop continues until the task is complete.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
tools=[
{
"type": "computer_20251124",
"name": "computer",
"display_width_px": 1024,
"display_height_px": 768,
"display_number": 1
}
],
messages=[
{
"role": "user",
"content": "Open the calculator app and compute 1847 * 23"
}
],
betas=["computer-use-2025-11-24"]
)
The beta header is required. Use computer-use-2025-11-24 for the latest models.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
Computer use shines for tasks that cross application boundaries. Things that would normally require a human to alt-tab, copy, paste, and click through UI flows.
Good fits:
Bad fits:
The sweet spot is visual tasks that require judgment. A script can click a button, but only a vision model can decide which button to click based on context.
This feature has real security implications. Claude can see everything on screen and control input devices. Anthropic recommends:
Anthropic added automatic classifiers that flag potential prompt injections in screenshots. If a webpage tries to trick Claude through on-screen text, the classifier catches it and asks for user confirmation before proceeding. You can opt out of this for fully autonomous use cases, but the default behavior adds an important safety layer.
Here is a real scenario. You need to pull data from a spreadsheet, enter it into a web form, verify the result, and log the outcome. Without computer use, you would build three integrations. With computer use:
messages = [
{
"role": "user",
"content": """
1. Open the Google Sheet in Chrome tab 1
2. Read the client names from column A
3. Switch to the CRM tab
4. For each client, search and update their status to 'Active'
5. Take a screenshot after each update for verification
"""
}
]
Claude handles the tab switching, reading, typing, and verification visually. No Sheets API. No CRM API. Just screen interaction.
Computer use works alongside other Claude tools. Pair it with:
The reference implementation from Anthropic includes a Docker container with all three tools configured together. It is the fastest way to experiment.
git clone https://github.com/anthropics/anthropic-quickstarts.git
cd anthropic-quickstarts/computer-use-demo
docker compose up
Computer use keeps improving with each model release. Haiku 4.5 actually surpasses Sonnet 4 at computer use tasks while running at a fraction of the cost. The trajectory is clear: faster, cheaper, more reliable desktop interaction with every generation.
For developers building automation tools, the implication is significant. Any application with a UI is now an application with an API - you just need to point Claude at the screen.
Computer use is available through the Claude API with standard per-token pricing. There is no additional charge for the computer use capability itself. You pay for the tokens in your messages, including the base64-encoded screenshots that get sent back and forth.
Yes. Claude Code has integrated computer use directly, so you can ask Claude Code to interact with desktop applications alongside its normal file editing and terminal capabilities. This is separate from the Chrome automation feature, which specifically targets browser interaction.
Both work. Claude can control your actual desktop, but Anthropic strongly recommends using a sandboxed environment like a VM or Docker container for safety. The reference implementation provides a Docker setup out of the box.
Slower than API calls or scripted automation. Each step requires a screenshot capture, image analysis, and action execution. Expect 2-5 seconds per action depending on the model and screenshot resolution. The tradeoff is flexibility - computer use works with any application without integration code.
Claude Opus 4.6, Sonnet 4.6, Opus 4.5, Sonnet 4.5, Haiku 4.5, and earlier Claude 4 models all support computer use. Haiku 4.5 is particularly notable - it surpasses larger models on computer use benchmarks while being significantly faster and cheaper.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Anthropic's agentic coding CLI. Runs in your terminal, edits files autonomously, spawns sub-agents, and maintains memory...
View ToolOpen-source AI pair programming in your terminal. Works with any LLM - Claude, GPT, Gemini, local models. Git-aware ed...
View Tool
New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Install Claude Code, configure your first project, and start shipping code with AI in under 5 minutes.
Getting StartedConfigure Claude Code for maximum productivity -- CLAUDE.md, sub-agents, MCP servers, and autonomous workflows.
AI AgentsInstall the dd CLI and scaffold your first AI-powered app in under a minute.
Getting Started
In this video, we dive into Anthropic's newly launched Cowork, a user-friendly extension of Claude Code designed to streamline work for both developers and non-developers. This discussion includes an

Anthropic's Latest Breakthrough: Automating Computer Operations with Claude 3.5 Links: https://docs.anthropic.com/en/docs/build-with-claude/computer-use In this video, we explore Anthropic's...

Claude Can Now Control Your Entire Computer (Dispatch Demo on Desktop + Mobile) Anthropic has released a new “computer use” capability for Claude that lets it see your screen and take keyboard and mo

Anthropic built Cowork in 1.5 weeks - a Claude Code wrapper that brings agentic AI to non-developers. Presentations, d...

Two platforms, two philosophies. Here is how Anthropic and OpenAI compare on APIs, SDKs, documentation, pricing, and the...

MCP servers and function calling both let AI tools interact with external systems. They solve different problems. Here i...