
In this video, I discuss the latest release from Anthropic, the Claude 3.5 Sonnet model. This new AI model has significantly outperformed its predecessor, Claude 3 Opus, in most benchmarks, including graduate-level reasoning and coding tasks. I'll break down the blog post, review the accompanying research paper, and demonstrate one of its standout features, 'artifacts,' which offers advanced code interpretation and visualization capabilities. Available on multiple platforms, the Claude 3.5 Sonnet is not only faster but also more competitively priced. The video also touches on the model's exceptional vision capabilities and future updates expected for the Claude 3.5 series. 0:00 Screen Recording 2024-06-20 at 10.40.13 AM 04:56 Marker
--- type: transcript date: 2024-06-20 youtube_id: b0IsiJkP1nQ --- # Transcript: Claude 3.5 Sonnet 👑: Anthropic's Answer to GPT-4o thropic came out today with a surprise announcement with their release of Claude 3.5 Sonet and this model has already outperformed their Flagship CLA 3 Opus model which was just released about a 100 days ago when you compare clae 3 Sonet across the board just about on almost all metrics with the exception of math problem solving as well as the zero shot Chain of Thought MML U score this ranks Best in Class across the board in a number of different benchmarks from graduate level reasoning to human AAL which is the coding Benchmark it has a 92% score when you compare that to GPD 40 or Opus this is a significant Improvement right in the video I'm just going to go over the blog post and then I'm going to touch on the paper that they released and then finally at the end of it I'm going to show you probably one of the coolest features that I have seen in one of these large language model interfaces from one of these big providers which is what they're calling artifacts which I'll show you at the end of the video they mentioned within the blog post that this new model outperforms A number of different competitor models across a wide range of evaluations clad 3.5 Sonet is now available basically anywhere that you could previously access the model so you can access it on cloud. within their IOS app or if you want to use it from their API you can use the anthropic API or you can use Amazon Bedrock or the Google Cloud vertex API now the other thing with this model is it's going to be priced at $3 per million tokens of input and $15 per million tokens of output with a 200,000 token context window so when you compare this pricing considering that it is more performant than Opus both in terms of speed as well as capability this is a very competitive price point for using their API now the other thing that's impressive is that this operates at twice the pace of Claude 3 Opus to have a Frontier Model like this that is already twice as fast in just about a 100 days goes without saying how impressive it is but the things that they really called out here that it does stand out with is The Graduate level reasoning as well as the mlu but also coding performance if you're going to be using this for coding ties it is going to be a really good model and the other thing that was interesting here is in an internal agentic coding evaluation Sonet solved 64% of the problems which outperformed CLA 3 Opus which only solved 38% agents have had a ton of Buzz this year it's still very early you can see that these numbers are still relatively low considering that you do want to have these things as close to 100% as possible but we starting to see these significant leaps in performance from the previous iteration of the different models that were out there they mentioned for their evaluation test of this it uses the model's ability to fix a bug or add functionality to an open source code base given a natural language description of the desired Improvement they mentioned when instructed with provided and relevant tools Cloud 3.5 Sonic can independently write edit or execute code with sophisticated reasoning and troubleshooting capabilities it mention mentions that it handles code translations with ease making it particularly effective for updating Legacy applications and migrating code bases now another exciting piece with this is as we're starting to see more of these multimodal models come online is that this is the strongest Vision model to date they describe that the improvements are most notable for tasks that require visual reasoning like interpreting charts or graphs but it also mentions that it can accurately transcribe text from imperfect images you can think of retail examples maybe trying to decipher a receipt or a different Logistics information or maybe invoices and what have you but it can also accurately transcribe texts even from imperfect images it's one thing to actually be able to OCR different texts but it's a whole other thing to be able to transcribe text from imperfect images right now I just want to quickly show you one of the really good demonstrations of the vision capabilities just to give you a sense on what you can do with this new model especially within the quad interface that they have within their web app here so you see here that they're uploading a number of different charts as well as some simple instructions it's asking for all of the data to be transcribed within Json and here you see that it's outputting within this nice little what they're calling artifact on the right hand side here which can be used like a code interpreter but you can also use this to visualize things like if you're making an SV or if you're rendering something like a website or a web app this new right hand panel that you see on the screen here this is what they're calling artifacts and within the chat window here you're able to create these new artifacts and just like this you can see like it's using a code interpreter style artifact or it's writing all that code to visualize that it has the context of from the previous step here you can start to get an idea on how you can leverage both that Vision capability as well as that new reasoning capability from that model and combine them to have these really interesting output you can see there it made a presentation it made an interactive chart all of which you could view within the artifacts now in terms of the evaluation metrics for the vision capability you can see that it does have leading performance basically across the board with just one exception here where gp40 does Beat It by a slight margin just like I showed you the artifacts feature earlier this is really a great new feature which I think is going to drive a lot of people potentially from chbt to the claw web app and their CLA app because what this will allow you to do is as you saw in the previous demo now in addition to that chat interface you also have this new preview window which you can leverage for things like creating visualizations for creating websites for creating a number of different things that you can interact with instead of having to copy and paste code from what chat GPT or what clad has created what you're able to do here is you're able to see that all rendered and streamed out right within this artifacts a viewer all within this chat interface here but makes this chat interface a lot more interactive the first time I saw something similar to this was actually with Gemini where you could open the piece of code in something like a repet workspace but this takes it one step further where now instead of having it on a separate platform that it links out to it has that all within the clad interface itself this is going to be incredibly useful I was trying it out a little bit earlier you can go over to quad. create an if you don't already have one and you can start to play around with it if I say create me a classic helicopter game and I click enter here so you do have to turn on artifacts just like you saw in that video and then once you have it turned on you'll have this interface here this is my example here and you can see how much quicker this model is the combination of how quick it is as well as how competent it is gives you a really good sense on the different things that you can do with it you have this Classic Helicopter Game you can see the preview there there's even a game over when there's that Collision that's been detected and then you have all of the code here this is going to be incredibly useful to actually see that the output that you've gotten is something that is effective and working being able to have the sandbox environment for these things like artifacts especially in a coding context is going to be just amazing to play around with there's a little bit more in terms of safety and privacy that you can read up on within the blog post and then the other thing that's exciting is the they're also going to be releasing the other versions of cloud 3.5 for both their Hau model as well as Cloud 3 Opus later this year I can only imagine what the Opus model of this will be capable of when it's released and they also mentioned that they are exploring features that they are developing new modalities that they're going to be featuring for different use cases as well as they're exploring features like memory that they're looking to include for user preferences and be able to interact with your history that you've created from from interacting with Claude overall I think in terms of all of the recent announcements this is probably one of the more exciting ones that I've seen because not only is it a more capable model you also have this new ux that you can experience with this new artifacts window now if you're looking for more information on the model itself you can check out this link to this paper that I'll link there's a number of different things like they included the needle and a haystack evaluation just to give you a sense on how capable the model is at recalling the different context especially on those larger limits like the 200,000 token context that it has and then it also plots across a number of different areas its capabilities it gives you metrics across coding Finance law medicine Etc you can see some of these are significantly higher in terms of Finance law philosophy even these are all considerably higher than Claud 3 Opus and you got to remember that 3 was just released about a 100 days ago huge kudos to the team over at anthropic for both releasing this new capable model but also for releasing this new Innovative way on how we can leverage these models with inux if you found this video useful please like comment share and subscribe otherwise until the next one
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.