
Introducing ChatGPT agent: bridging research and action OpenAI has launched a new ChatGPT agent that combines operational capabilities with deep research functionalities. This video provides an overview of its features, including generating spreadsheets and slideshows, performing web searches, and handling complex tasks like planning and analysis. Demonstrations showcase its user interface and interactive browser capabilities. The video also discusses benchmarks where ChatGPT agent excels, such as human exams and data science tasks. Viewers will learn about potential limitations, how to get started, and availability details for Pro and Plus users. Blog link: https://openai.com/index/introducing-chatgpt-agent/ 00:00 Introduction to OpenAI's Chat GPT Agent 00:36 Capabilities and Features of the Chat GPT Agent 01:48 Examples and Demonstrations 02:57 Benchmark Performance and Analysis 05:16 Availability and Subscription Details 05:40 Conclusion and Call to Action
--- type: transcript date: 2025-07-17 youtube_id: kaMT5o2vI64 --- # Transcript: ChatGPT Agent in 6 Minutes OpenAI has just released a chat GPT agent. This is an agent that allows you to both take action as well as research on the web. Effectively, this is a combination of both operator as well as deep research. In this video, what I'm going to do is I'm going to go over the post and by the end of the video, you'll have an idea on how you can get started as well as some of the potential limitations to be aware of. This is an agent that allows you to take action on the web as well as some nice other features like being able to generate spreadsheets, slideshows, as well as a handful of other aspects. In this video, I'm going to go over the article. I'm going to go over some of the benchmarks and then I'll also show you some of the capabilities as well as how you can get started with this. First up, just to touch on some of the aspects within the blog post. So, they describe this as Chat GPT now being able to do work for you using your own computer and handling complex tasks from start to finish. things like being able to look at your calendar and brief me on the upcoming client meetings based on the recent news to things like being able to plan ingredients to make a certain type of breakfast or being able to analyze different competitors and create a slide deck. The thing to note with this is if you've used operator it does have those similar capabilities of actually being able to interact with a website but paired with the agent is it does also have deep research skill in synthesizing the information and chat GPT's intelligence to get desirable results for whatever you ask for. Another aspect of chatbt agent is you are in control. At any point you can jump in. Say if you need to put in your credit card information or if it's actually doing something that you don't want it to do, you can easily interrupt and take over or stop any tasks at any point. Now, one thing that they did emphasize with this is just to be mindful of some of the risks. You might want to be mindful of where you're putting your credit card information or what potential websites that it's navigating towards because the thing with this is it's almost like a new modality on how we actually browse the web. You do have to be a little bit more mindful that the agent could potentially go to websites and enter in information that you might not want it to. Now, just to show you some examples, this is going to be what it looks like. You can go and you can select agent mode within chat GPT. Here's an example of it actually creating a spreadsheet. It does have the ability to search the web, something like a Google search. It can synthesize all of those different page results for you. And the one thing that is nice with it is it does have a really nice visual UI. So, it's able to read PDFs. It's able to read different websites. And within all of that, you do see this fluid UI where it will go back and forth between searching and summarizing. So the agent both has a text browser like you see on the screen here where it's going to be able to search different things or be able to summarize different pages. But if any action needs to occur, like if there are interaction barriers or if you do need to click on certain things to actually say make a reservation for something, you'll be able to have an interactive browser just like this. And the other thing to note with this is you will be able to see visually the different tools that you give it access to just like you see here. Another really powerful feature is the ability to generate PowerPoint slides. Similar to the Excel functionality where it can do all of that research, it will be able to output it in a PowerPoint where you're going to be able to have visuals or be able to actually use it as the basis for a PowerPoint presentation for whatever the topic you're searching. Now, in terms of some of the benchmarks, so on humanity's last exam, so chat GPT agent scores a 41.6. This is an increase over Grock 4 that just came out. What is really interesting with this benchmark is what you'll see within here is we see the different results plotted for 03 with no tools all the way through to chat GPT with browsing the computer as well as the terminal. This goes without saying as you equip the LLM with more capabilities like the terminal or being able to browse the web. Obviously at that point similar to a human you're going to be able to have these agents that can spawn off and actually perform the relevant tasks to do what it needs to do. Being able to have a terminal where you can do things like math equations or what have you and have it be more deterministic paired with the ability to research the web just like a human would. Ultimately, all of that aggregates into the state-of-the-art score on humanity's last exam here. Now, in terms of some other benchmarks, we do see some state-of-the-art results in terms of Frontier Math as well as tasks like DSBench, which is designed to evaluate agents on realistic data science and task spanning across data analysis and modeling. And one really interesting benchmark that I haven't really seen before is spreadsheet bench. With these results, we actually have the benchmark for a human here. And we see with chat GPT agent with XLSX access, it does now have 45.7% of the capabilities when compared to the humans, 71.3%. If you're interested in any of these benchmarks, I'll put it within the description of the video where you can click through and you can read up on some of the methodologies as well as if you want a deeper analysis on each of these particular things. Now in terms of investment banking modeling analysis, this was another interesting one. Its capabilities is again increased substantially over 03 which just a number of months ago was the state-of-the-art model. This is just showing you again how quickly things are progressing in the field of AI for browse comp. This is another huge leap from deep research from 55.5% all the way up to 68.9. And then similar to the spreadsheet bench result, we can see on the web arena that the agentic browser use is coming up on human capabilities. Now another thing that is worth highlighting is you can specify the agent to have different tasks to recur automatically. Let's say you want to have the update of the latest AI news every morning or have a financial report every Friday on particular stocks that you're interested in. You'll be able to specify the agent to generate all of those things at certain intervals. Now, in terms of availability, it is rolling out to pro plus and team members today. Pro users will be getting this by the end of the day, while plus and team users will be getting this over the next few days. Now, in terms of rate limits, if you're a pro member, this is their $200 a month tier, you are going to be able to send 400 messages per month. Alternatively, if you are a plus tier, their $20 a month tier, you are going to be able to get 40 messages per month for this. Hopefully, you found this useful. If you did, please comment, share, and subscribe. Otherwise, until the next
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.