
In this video, we dive into Anthrop's latest release, Claude Opus 4.5, touted as the best model for coding agents and computer use. We review the blog post and significant announcements, such as the model's cost-efficiency and performance benchmarks. Highlights include its superior performance in software engineering tasks, multi-language capabilities, and new features in the cloud API, such as effort control. A demonstration shows its efficiency in handling complex tasks autonomously. The video concludes with insights into the model's real-world applications and its ability to outperform human candidates in technical assessments. 00:00 Introduction to Claude Opus 4.5 00:25 Key Announcements and Pricing 00:45 Benchmark Performance and Efficiency 01:32 New Features and Capabilities 02:34 Cloud Code and Desktop App Integration 03:33 Demo: Creating a SAS Landing Page 04:16 First Impressions and User Feedback 05:06 AI's Impact on Engineering 05:20 Conclusion and Call to Action
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
--- type: transcript date: 2025-11-24 youtube_id: TrouQWADTU4 --- # Transcript: Anthropic's Claude Opus 4.5 in 5 Minutes Anthropic has just released Claude Opus 4.5, a more intelligent and efficient model, and they claim it's the best at coding agents as well as computer use. In this video, I'm going to go over the blog post as well as touch on some of the nits that they had as a part of the release today. Then at the end of the video, I'll show you a quick demo of the model within cloud code. First things first, in terms of software engineering tasks, in terms of benchmarks, as you might expect on software engineering task, Opus 4.5 is bestin-class. Now, one of the big announcements as a part of the release today is that Opus is now three times cheaper at $5 per million tokens of input and $25 per million tokens of output. You are going to be able to access this model basically everywhere. You're going to be able to access this within the web app, within cloud code, as well as all of the major cloud providers as well. Now, now if we take a quick look at the benchmarks, basically on all of the agent coding benchmarks like SweetBench verified as well as Terminal Bench, this model is also considerably better at computer use. Now, one of the things that they did mention as a part of the announcement is the efficiency of the models. This model isn't going to be having to use a ton of tokens relative to actually get that performance, and that's one of the key metrics. It's one thing if you can spend a ton of tokens to actually get state-of-the-art performance, but it's a whole other story if you're able to get to the same result efficiently. If we take a closer look at some of the other benchmarks, so for SweetBench Multilingual, we can see that again, this model outperforms all of the other previous versions. Polylot, we have an 89.4%. For browser comp, we have a 72.9%. And then on vending bench, we have $4,967. Think still is shy of Gemini 3 Pro. Now, additionally, there is a new effort parameter within the cloud API, which allows you to decide how much time to spend on each task. This gives further control in terms of how you actually want to leverage the model. And now one of the key metrics here, Opus 4.5 matched Sonnet 4.5's best score on Sweetbench verified, but it used 76% fewer output tokens. At its highest effort level, Opus 4.5 exceeds Sonnet 4.5 performance by 4.3 percentage points while using 48% fewer tokens. Basically, regardless of the mode that you choose, this is a very efficient model. the new performance within Opus 4.5. They mentioned it is very effective at managing a team of sub aents and enabling the construction of complex well-coordinated multi- aent systems. And one of the things that is very clear with these models that are coming out is the focus is increasingly on its agent capabilities. How much can we actually trust these systems? How long can they run autonomously for? In addition, without having to actually add in a lot of intervention. It is pretty interesting to see some of the evolution. Now, next up, they also have cloud code within the desktop app. If you are a fan of the cloud web app, you will be able to now also have the same experience within the desktop app. Personally, I think I'm still going to be preferring the CLI version of Cloud Code when I leverage it. But now, additionally, they do also have support within PowerPoint, Microsoft Excel, as well as Microsoft Word in addition to rolling out expanded support for their Chrome extension. Now, before I dive into a quick demonstration, one thing that they did flag as a part of the announcement was Opus 4.5 handles ambiguous task as well as reasons and trade-offs without handholding. In other words, you're going to be able to trust the system without having to give it constant guidance. So, it does seem like the model does have good intuition in terms of the direction to go depending on the instructions that you give it. Now, just a handful of other product features. They did announce longer conversations. There is newly tool search programmatic tool calling as well as tool use examples effort control like I mentioned and also context compaction as a part of the release within the API as well. Now to give a really quick demonstration of the model within here I have clawed code. I have it set to opus 4.5. Now what I'm going to say is create a beautiful SAS landing page within a glass morphism theme. Let's have the primary colors be black, white, as well as blue. I'll also say I want this to be within a next.js application. I'm going to go ahead and kick this off. Now, the one thing to note with this is this task did take about five minutes to accomplish. Here is what it has generated for us. Now, the one thing that I do want to note with this is while I don't particularly love all the different things that it has within here, I didn't actually give it too much instruction. One thing that I'd encourage you to do is try it out with a photo. Let me know if you have better results. This model is supposed to be quite good at image understanding. So, being able to pass in something like a screenshot from a Figma or what have you, I would be curious to see the results from something like that. But otherwise, I just wanted to touch on a couple other aspects from the blog post. Now, in terms of first impressions, the one thing that I've seen online from early testers of the model as well as from their own internal team is that Opus 4.5 just quote unquote gets it. Being able to actually pass off the task to this system without having to hold its hand. It was able to perform quite well without needing too much intervention. Now, one of the really interesting things is Anthropic, as you might expect, they have a notoriously difficult take-home exam. Opus 4.5 is the first model that has actually scored higher than any human candidate ever. That is something to be said in of itself. The take-home test is designed to test technical ability, judgment under time pressure, and it doesn't test for any other crucial skills that candidates may possess like collaboration, communication, or instincts that develop over years. But this result where an AI model outperforms strong candidates on important technical skills raises questions about how AI will change engineering as a profession. And then they touch on some of the studies as well as research aimed at understanding these types of impacts across different fields. That's pretty much it for this video. If you found this video useful, please like, comment, share, and subscribe. Otherwise, until the next
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.