
In this video, we dive into the latest releases from Anthropic with an in-depth look at Claude Opus 4 and Claude Sonnet 4. We discuss the key features and improvements of these hybrid models, their capabilities in coding and reasoning, and their integration with tools like GitHub and various APIs. We also explore real-world use cases and provide insights into their performance benchmarks. Stay tuned as we reveal how these models are pushing the boundaries of AI in software engineering and beyond. Today, anthropic introduced the next generation of Claude models: Claude Opus 4 and Claude Sonnet 4, setting new standards for coding, advanced reasoning, and AI agents.Claude Opus 4 is the world’s best coding model, with sustained performance on complex, long-running tasks and agent workflows. Claude Sonnet 4 is a significant upgrade to Claude Sonnet 3.7, delivering superior coding and reasoning while responding more precisely to your instructions. https://www.anthropic.com/news/claude-4 00:00 Introduction to Claude Opus and Sonnet 4 00:48 Key Features and Capabilities 01:24 Claude Code and SDK Announcements 02:17 API Enhancements and Integrations 03:08 Access and Pricing Details 03:40 Performance and Testimonials 05:43 Model Training and Relevance 07:14 Future Content and Demos 07:29 Example Use Cases 09:11 Conclusion and Final Thoughts
--- type: transcript date: 2025-05-23 youtube_id: pLAWNJjAdPw --- # Transcript: Claude 4 Sonnet & Opus in 9 Minutes All right, the best coding model in the world is now here. Just today, Anthropic released two new models. We have Claude Opus 4 as well as Claude Sonnet 4. In this video, I'm going to go over some aspects of the blog post. I'm going to touch on some aspects within the model card. And then I'll point you in the direction of some of the other capabilities that they did announce today in addition to the model release. They don't mince words within the blog post. They mentioned that Cloud 4 Opus is the world's best coding model with sustained performance on complex longunning tasks and agent workflows. Cloud for sonnet is a significant upgrade to cloud 3.7 which delivers superior coding and reasoning while responding more precisely to your instructions. One thing that Dario Ammedday mentioned the CEO of Anthropic when he announced these models was with Claude 4 you you shouldn't see that type of overeagerness that you might have experienced with Claude sonnet 3.7. Now one really cool thing that both of the models have is these are hybrid models. So, similar to Claude 3.7, it does have the ability to have that extended thinking mode where you can think through a problem before it ultimately responds back and that correlates with better performance. But if you do want those near instant responses, you also have that as an option as well. But one of the really cool things is you're actually going to be able to invoke tools during the thinking process. Instead of just having that stream of tokens and it thinking through, you're going to be able to access things like web search to improve those overall responses. This one in particular, I was quite excited to see some new capabilities. Both of the models can use tools in parallel. Also, today as a part of the announcement, Claude Code is generally available. And one great thing with this is now you can actually support background tasks via GitHub actions. In other words, if you want to have Claude Code action something on an issue or APR or anything like that, you're going to be able to leverage Cloud Code with their new SDK to be able to perform things like GitHub actions and to more natively integrate into platforms. Just earlier this week, Claude Code announced that there is an SDK. Instead of just being able to interact with it from the CLI, they're actually going to introduce both a Python as well as a TypeScript library. And they are going to expose it so you can run it within environments. And there are going to be a ton of different potential use cases on how we can leverage this within different workflows. Now, another great thing is they do now have native integrations within both VS Code as well as Jet Brains. Finally, they also announced four new capabilities within their API. We're going to have the code execution tool, the MCP connector, the file API, as well as the ability to cache prompts for up to 1 hour. They also announced four new features within their API. So, you can now execute code directly from the cloud API. They do also have an MCP connector on the Anthropic API. So, basically what this allows for is you can just connect to those external MCP servers. The cloud API will be able to handle things like the connection management, the tool discovery, as well as the actual invocation of those different tools if they're determined to be used. In addition to that, they also announced a files API. Instead of having to repeatedly upload files, you can actually upload them and store them within the anthropic API. And then finally, they did announce extended prompt caching. So now developers can choose anywhere from 5 minutes to an hour for the prompt caching capabilities. So now in terms of being able to access the models, you can access it on the Pro Max team as well as the enterprise plans. You will be able to try out Sonnet 4 on the completely free tier as well if you're interested. Now, if you're interested in trying it out on the API, it is released same day. You can get it from the Anthropic API. You can also get it on AWS Bedrock as well as GCP's Vert.Ex AI. Now, in terms of pricing, $15 per million tokens of input and $75 per million tokens of output with Sonnet 4 being $3 per million tokens of input as well as $15 per million tokens of output. In terms of the benchmarks for Swebench as well as Terminal Bench, we see both Opus as well as Sonnet absolutely crush the competition. When we compare on the Agentic coding task and we compare it to something like 03 or Gemini 2.5 Pro, we can see considerably improved performance. They do have a handful of testimonials within here. Cursor calls it state-of-the-art at coding and a leap forward in complex codebase understanding. But the one thing that really stood out here is Rakutin validated its capabilities with a demanding open- source refactor running independently for 7 hours with sustained performance. If we just think about that, this line in particular is really interesting in terms of the implications. Basically, think about sending off an employee that's just going to work diligently at whatever task you throw at it for hours on end. You can think of this as spinning up like a software engineer. Now, in terms of the performance and the particular task, this just shows you both the capabilities of this model, but also the direction of how LLMs are going to be leveraged, especially in the context of coding, is we're really moving into an agentic era. And instead of having coding models just complete the next line or the next function block, now we have LLMs that can build entire applications with increasing accuracy as well as understanding larger and larger code bases. We're probably going to see this model really used across the board given its price to performance ratio and its availability within tools like Cursor or GitHub. And basically they come to the same conclusion. So these models advance customers AI strategies across the board. Opus 4 pushes boundaries in coding, research, writing, and scientific discovery, while Sonnet 4 brings frontier performance to everyday use cases as an instant upgrade to Sonnet 3.7. Next, here's a really great example of Claude code and the GitHub action. What you can do is you can assign Claude to address feedback for a particular task. It's going to gather the context, address the feedback, and create a pull request that you can then review and merge within your project. That's just to give you yet another idea in terms of the direction on where things are going, especially with software engineering. Now, just to hop into some things within the model card. So, both Opus as well as Sonnet were trained on a proprietary mix of publicly available information on the internet as of March 2025. A lot of the model releases that we've seen is it isn't uncommon where we do have models that do have a lag of whether it's 6 months, 12 months, even longer depending on the model. And where that can be problematic, especially in context of software engineering, is oftent times with say something like a framework, is if you have a model that was trained a year ago, you might not have access to a lot of the new SDKs or even have the context of say something like model context protocol. Whereas a model that has information up to just a number of months ago just gives us a much more capable model that has more relevant context to what we might be looking for. Now, in terms of being able to try out the model, you can try this out completely for free at claw.ai. You will be able to even access some of their features like their artifacts where if you want to build out small little web apps. Now, if you are interested in trying Opus, you will have to get one of their paid plans. A couple other things to note with the model is there is also prompt caching available. If you do want to try and save on some costs of Opus, you do have that as an option on a 200,000 token context window. You will also be able to get a 50% discount if you are going to be batch processing. One thing to note is you do have prom caching as an option on both Opus as well as Sonnet 4 for a number of different applications. That's also going to be able to save a ton of money, especially on those output tokens. So, I'm not going to be diving into demos within this video. I plan on doing videos over the coming days to actually show you how both Claude Code works, Opus, as well as Sonnet. If you're interested in seeing that type of content, just stay tuned to the channel. I'll have some more videos on that coming out shortly. Just to show you a really quick example of the model. So, I asked it to create a beautiful SAS landing page. Basically, what we have is a navigation. We have these floating elements. We have this nice hero area. And if I just scroll through the website, I can see all of these nice little tiles. We have the pricing section. We even have these nice animations. As we scroll down the page, we have things like hover effects. But as you can see here with a very basic prompt without much instruction, it's able to give a very respectable starting point for something like a web application. Now in terms of being able to access the model, you can get sonnet 4 within cursor right now. If you do want to access cloud for opus, given the price, it is available max only. I believe cursor did implement API pricing. If you do want to leverage this model, you will be build at the rate of what they get charged from anthropic. So just know if you do want to use Opus within cursor, it is going to cost you. Another interesting thing that I saw today was the CEO from Windsurf mentioned that unfortunately Anthropic did not provide our users direct access to Claude Sonnet 4 or Opus 4 on day one. They're actively working to find capacity elsewhere. That given OpenAI is potentially buying Windsor for several billion dollars. This potentially is a reason why Anthropic might be dragging their feet to actually unlock that bandwidth to Windsurf as well as their users. Last, I wanted to show you just an example of that thinking and reasoning as well as leveraging tool use. Just to give you an idea, let's say the model thinks that it has to hunt for compelling recent AI research and trends. From there, it can go and invoke a search tool to go and actually perform that search functionality and return those results. Once it has those results, it can continue on thinking before it ultimately gives you that response. Otherwise, that's pretty much it for this video. If you found this video useful, please comment, share, and subscribe. Otherwise, until the next
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.