
Introducing Claude 3.7 & Claude Code Anthropic has unveiled Claude 3.7, their most advanced hybrid reasoning model to date, offering both near-instant responses and extended step-by-step thinking capabilities. This release also includes Claude Code, a tool for developers in research preview, which enables significant improvements in coding tasks. Claude 3.7 represents a substantial leap from its predecessor, Claude 3.5, particularly in real-world tasks like math, coding, and more. The model shows promise in applications ranging from repository management to UI generation, with impressive performance benchmarks and visible thought processes. Additionally, it features a unique 'thinking budget' and can handle up to 200,000 tokens. Early access is available for Claude Code, with limited seats. https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/overview 00:00 Introduction to Claude 3 00:25 Claude 3.7 Sonnet: A Leap in Performance 00:36 Extended Thinking Capabilities 01:58 Claude Code: A New Tool for Developers 02:54 System Requirements and Installation 03:57 Visible Thought Process and Alignment 04:51 Action Scaling and Real-World Applications 05:28 Performance Benchmarks and Fun Use Cases 06:33 Generating UI Components with Claude 3.7 07:37 Conclusion and Final Thoughts
--- type: transcript date: 2025-02-24 youtube_id: LmBvVZbLN34 --- # Transcript: Anthropic Claude Sonnet 3.7 in 8 Minutes anthropic has just released Claude 3.7 Sonet the most intelligent model to date this is what they're calling a hybrid reasoning model what's unique with this model is it can both Produce near instant responses or you can enable extended step-by-step thinking as they describe it is one model but effectively two different ways to think in addition to this language model they're also releasing a tool called Claud code which is now in research preview in terms of the benchmarks 3.7 Sonet is a significant improvement over its predecessor if we just look at the chart here basically across the board we can see that clae 3.5 Sonet even without any extended thinking that it really ranks up with the performance of all of the reasoning models where this model really shines is in its extended thinking capabilities in areas like math physics instruction following coding and other tasks even though this model just came out today I'm going to go out on a limb and say that this is probably going to be the best coding model and the preferred model for Developers over the coming weeks CLA 3.5 originally came on the scene in I believe it was June of last year and ever since that model came out that model really enabled a ton of different companies to reach product Market fits we saw cursor really take off we saw tools like bolt. new really take off and a large part of those tools success was at its core Sonet 3.5 this isn't to Discount what those teams have done but having a really strong Foundation model with the coding capabilities it just allows you to build build out these applications that you weren't previously able to and with that in mind we can still see what a significant leap 3.5 to 3.7 is now the other thing that's interesting with this is in developing it we've optimized somewhat less for math and computer science competition problems and instead shifted Focus towards real world tasks that better reflect the needs of our users instead of focusing on these code competition tasks that might not be as applicable to software Engineers day-to-day next in addition to the model what they also announced was Claud code Cloud code is an agentic tool that you can run within your terminal and here's just a quick demonstration of it once you have it installed you'll be able to run Claud this is going to work through the anthropic API once you're within the root of your project it will not only be able to answer questions about your repository but also Implement changes if you have a change that involves multiple files it will go and search for those files read through those files and then ultimately update those files the tool can also run terminal commands if you're trying to compile it push a hub or run a series of tests you'll be able to do all of that within the terminal app and I also plan to make another video on this specifically in the coming days it currently is in a limited preview it is first come first serve at time of recording I will put that within the description of the video able to successfully get access to this just hours after trying but with that being said I'm not sure exactly how many seats that there are for this tool so in terms of system requirements there are pretty humble system requirements here to set it up you can run through the installation steps go within the root of the project that you want to run the tool on you can launch Cloud code and be able to set up the authentication through the anthropic API then you'll be Off to the Races now another thing to note within the cloud. you will be able to go over to GitHub and you can directly add the context of your application with the input here that is a really novel use case even if they're private you'll be able to access and select all of the different contexts that you want within the repository now you are also going to be able to access this model from the artifacts Paine the other great thing with the model is you will be able to select from the API when and when not to turn off in extended thinking mode so you will be able to set as a developer the quote unquote thinking budget to control precisely how long Claude spends on a particular problem they mention that this extended mode is not an option that switches to a different model in a separate strategy instead it's the very same model it just gives itself more time to think and expend more effort in coming to an answer now another thing that a lot of people will appreciate is that they are going to give a visible thought process they're not abstracting the thought process away they have decided to make the thought process visible in raw form they mentioned that this has several benefits so trust being able to observe the way that Claude thinks it makes it easier to understand check its answer and it might help users get better outputs alignment in some of our previous alignment science research we've used contradictions between what the model inwardly thinks and what it outwardly says to identify when it might be engaging in concerning behaviors like deception an interest it's often fascinating to watch Claude think and this is something that I have to say when I originally saw the Deep seek R1 model that was probably one of the most interesting pieces of using the model is actually being able to read through the thought process and understand why it's making the particular decisions that it's decided on another interesting piece with the news release Here is that 3.5 benefits from what they call Action scaling an improv capability that allows it to iteratively call functions respond to environmental changes and continue until its open-ended task is complete one example of such task is using a computer Cloud can issue virtual Mouse clicks virtual keyboard presses to solve tasks on a user's behalf now in terms of the OS World Benchmark so this is its ability to actually use the computer that computer use agent that we saw they released in the fall we can see that this is a significant improvement over the previous model now it still is between about 25 and 30% there still is a ways to go but but I'd imagine that over the coming months and years that we'll increasingly see this Benchmark as well become saturated now another interesting and really funny example that they had is claude's ability to play the game Pokemon we can see some of the previous models and how they ranked on the particular toss and how far it could get within the game Claude 3.5 Sonet was able to take 35,000 actions what was able to get past Vermillion City and get the surge badge it's pretty wild to think that now we have a model that is capable of actually playing some of these RPG games this is both a fun and interesting Benchmark to think about now in terms of how the thinking and reasoning model works is you can allocate a budget to how long you want to have the model think for if you are working on particularly tough tasks the more tokens that you allocate towards the thinking the longer that it's going to take but also the better the response that it's ultimately going to give you within the post there are a ton of different benchmarks you can see how it performs across biology chemistry physics as well as how it performs with that extended thinking as well now in terms of the speed of the model it is a relatively fast model here is a prompt where I asked it to create a beautiful SAS landing page with pricing in great detail it was able to write several hundred lines of code in just a number of seconds now if I take a look at preview I'm going to go out on a limb and say that this is probably going to be the best model for generating UI components this is definitely a model that does perform very well at generating UI components so we can take a look at all of this we also even have animation within this if I scroll down we we can see what looks like honestly a respectable start to a website just judging by what it generated it does have very nice UI components this is probably one of the best results for a question like this that I have got from an llm oftentimes to get something this polished out of the gate you would have otherwise had to have a pretty involved prompt to generate something like this in terms of the context window for the model you'll be able to send in up to 200,000 tokens for the output it has up to12800 ,000 tokens of contact although 64,000 is generally available and 128k is available at least as beta and I believe it would also be the same on gcp as well as anthropic themselves overall that's pretty much it for this video kudos to the team over at anthropic for releasing a model that I'm sure developers are going to love if you found this video useful please like comment share and subscribe otherwise until the next one
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.