TL;DR
OpenAI has merged its browsing capabilities with deep research into a single agent that can take action on the web, generate spreadsheets and slide decks, and handle complex multi-step tasks from sta...
OpenAI has merged its web browsing capabilities with deep research into a single product: the ChatGPT Agent. This is a combination of what Operator could do - interacting with websites, clicking buttons, filling forms - with the synthesis and analytical depth of deep research. The result is an agent that can handle complex, multi-step tasks from start to finish.
The ChatGPT Agent can both research and act. Previous iterations forced a choice: use deep research for information synthesis, or use Operator for website interactions. The agent combines both capabilities into a unified workflow.
Practical examples of what this enables:
The agent handles these by spawning browsing sessions, synthesizing information from multiple sources, and producing structured output - whether that is a spreadsheet, a PowerPoint presentation, or a formatted summary.
Under the hood, the ChatGPT Agent operates with two distinct browsing modes. The first is a text browser that handles standard web searches and page summarization. It can read PDFs, parse article content, and extract data from structured pages. This is the research side of the equation.
The second is an interactive browser that activates when actions are required. If the agent needs to click through a checkout flow, fill out a reservation form, or navigate a multi-step process that requires real browser interactions, it switches to a full visual browser session. You can watch it navigate in real time.
The visual UI shows which tools the agent is using at any given moment. You see it switch between searching, reading, summarizing, and interacting - creating a fluid workflow that adapts to whatever the task demands.
Beyond text responses, the agent generates structured artifacts:
Spreadsheets - The agent can create Excel files from research data. Ask it to compile a comparison of SaaS tools with pricing, features, and user ratings, and it outputs a formatted spreadsheet you can download and use directly.
Slide Decks - PowerPoint generation is built in. The agent researches a topic, structures the information into slides with appropriate visuals, and delivers a presentation-ready file. This is not placeholder content with bullet points - the slides include sourced data and formatted layouts.
Recurring Tasks - You can schedule the agent to run automatically at specified intervals. A morning news digest, a weekly financial summary of specific stocks, or a daily competitor monitoring report can all run on their own schedule.
Get the weekly deep dive
Tutorials on Claude Code, AI agents, and dev tools - delivered free every week.
The benchmarks reveal why OpenAI felt confident shipping this as a distinct product rather than an incremental update.
Humanity's Last Exam scores 41.6%, surpassing Grok 4's previous leading result. What makes this benchmark particularly interesting is the progression chart. OpenAI plots results from O3 with no tools through ChatGPT Agent with browsing, computer use, and terminal access. The trend is clear: equipping models with more capabilities produces compounding improvements, similar to how a human with access to a calculator, reference books, and the internet would outperform one working from memory alone.
Frontier Math and DSBench (data science task benchmarking) also show state-of-the-art results. The DSBench numbers are particularly relevant because they test agents on realistic data analysis and modeling workflows - the kinds of tasks the ChatGPT Agent is explicitly designed for.
SpreadsheetBench is a newer benchmark that evaluates agents on spreadsheet manipulation tasks. ChatGPT Agent scores 45.7% with XLSX access, compared to a human baseline of 71.3%. Not parity, but a substantial jump from where these capabilities stood even months ago.
WebArena measures agentic browser use, and results show the gap between AI browser agents and human web navigation continuing to close. Combined with the BrowseComp leap from 55.5% (deep research) to 68.9% (ChatGPT Agent), the data suggests that merging research and action capabilities produces more than the sum of its parts.
Investment Banking Modeling benchmarks also showed major gains over O3, which just months ago was the state-of-the-art model. The speed of progression in these specialized financial analysis tasks underscores how quickly the field is advancing.
OpenAI emphasizes that users remain in control throughout any agent session. You can interrupt at any point - useful when the agent approaches sensitive actions like entering payment information or navigating to websites you have not authorized.
This is a real consideration, not just a disclaimer. The agent operates in a new browsing paradigm where an AI is actively navigating the web and potentially interacting with forms and services on your behalf. Being mindful about what information the agent has access to - credit card details, login credentials, personal data - is important as this modality matures.
The rollout follows OpenAI's tiered approach:
| Tier | Price | Agent Messages/Month |
|---|---|---|
| Pro | $200/mo | 400 |
| Plus | $20/mo | 40 |
| Team | Varies | Rolling out |
Pro and Team members get access first, with Plus users following within days. The rate limits are notable: even at the $200 tier, you get 400 agent messages per month, which means roughly 13 per day. For the Plus tier, 40 messages per month translates to about one or two per day - enough to test the capabilities but not enough to make it a daily workhorse.
One of the more practical features is the ability to schedule recurring agent tasks. You can configure the agent to run specific workflows on a schedule:
This moves the ChatGPT Agent from a reactive tool (you ask, it answers) to a proactive system that delivers value without requiring your attention. The scheduled tasks run in the background and deliver results to your inbox or ChatGPT conversation history.
For anyone who has built similar automation with tools like Zapier or custom scripts, the appeal is obvious: natural language configuration instead of workflow builders and API integrations.
The 40 messages per month on the Plus tier is the most significant practical constraint. That is roughly one agent task per day, which means you need to be deliberate about what you ask the agent to handle. Complex multi-step tasks that would normally take several back-and-forth messages count against this quota.
The agent also inherits the limitations of web browsing AI. Sites with aggressive bot detection, CAPTCHA challenges, or complex authentication flows can trip up the interactive browser. Login-gated content remains tricky unless you are already authenticated in the session.
Response time varies significantly based on task complexity. A simple web search and summary might complete in under a minute. A comprehensive competitive analysis with spreadsheet output could take several minutes as the agent navigates multiple sites, synthesizes information, and generates structured output.
The ChatGPT Agent represents a convergence pattern we are seeing across the industry: the merging of research, reasoning, and action into unified agent experiences. Google, Anthropic, and xAI are all moving in similar directions.
For developers building AI-powered applications, the key takeaway is the tool-use architecture. Models equipped with browsing, terminal access, and structured output capabilities consistently outperform models running in isolation. This validates the agent framework approach - not just for end-user products like ChatGPT, but for developer tooling where AI agents coordinate multiple capabilities to accomplish complex tasks.
The benchmark trends also reinforce something practitioners have observed: the gap between AI capabilities and human performance on complex, real-world tasks is closing faster than most people expected, particularly when agents have access to the right tools.
For teams evaluating whether to build their own agent systems or leverage platforms like ChatGPT Agent, the calculus depends on control requirements. If you need deterministic behavior, custom tool integrations, and fine-grained control over the agent's decision-making process, building your own agent stack remains the better path. If you need general-purpose research and action capabilities without the engineering overhead, the ChatGPT Agent provides a ready-made solution that is improving rapidly.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
OpenAI's cloud coding agent. Runs in a sandboxed container, reads your repo, executes tasks, and submits PRs. Uses GPT-5...
View ToolOpenAI's flagship. GPT-4o for general use, o3 for reasoning, Codex for coding. 300M+ weekly users. Tasks, agents, web br...
View Tool
New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
OpenAI's latest flagship model. Major leap in reasoning, coding, and instruction following over GPT-4o. Powers ChatGPT P...

Exploring ChatGPT's Deep Research OpenAI has launched their second AI agent, Deep Research, available in ChatGPT, focusing on executing complex research workflows in 5 to 30 minutes. Key features...

OpenAI AI has launched their first browser called ChatGPT Atlas, which incorporates ChatGPT for enhanced functionality. This browser allows users to interact with their documents using natural...

In this video, I dive into an in-depth comparison between the latest AI models GPT-4.5 and Claude 3.7 Sonnet. 📊 You'll learn about the strengths and weaknesses of each model, as well as...

OpenAI's Deep Research is an AI agent inside ChatGPT that plans and executes multi-step research workflows, browsing doz...

OpenAI has entered the browser wars with ChatGPT Atlas, a web browser that embeds ChatGPT directly into the browsing exp...

OpenAI added scheduled tasks and reminders to ChatGPT, turning it from a chat interface into something closer to a perso...