DeepSeek V3 0324 in 6 Minutes: Better than GPT 4.5 & Sonnet 3.7? - Developers Digest

Transcript

--- type: transcript date: 2025-03-25 youtube_id: MvNp9-OGPKw --- # Transcript: DeepSeek V3 0324 in 6 Minutes: Better than GPT 4.5 & Sonnet 3.7? Deepseek just quietly released on Hugging Face yesterday an update to their V3 model that just came out a few months ago and now it looks to be potentially one of the best non-reasoning models in the world. In this video, I'm going to go over some of the details. I'm going to show you how to get started with it and then I'll also show you some demos on what it looks like. When this originally dropped, all that they had were the model weights as well as that the license was an MIT license. Ever since then, they added some details to the readme file and it is incredibly impressive. As we can see across a number of these benchmarks, we can see that this model performs very well in particular tasks such as math as well as coding, even outperforming GPT 4.5 as well as claude 3.7 sonnet. Now, if we compare it to the previous model, we also see a significant leap in performance from a model that was just released a number of months ago. While these benchmarks are impressive, what's even more impressive is that this is an open- source model as well as the pricing of this model when we compare it to some of these flagship frontier models from OpenAI as well as Anthropic. This model does not only outperform those models in a number of different tasks, but it is also considerably cheaper. Just to demonstrate what I mean by this, so this is artificial analysis, which is a really great site that does independent analysis on AI models as well as API providers. If we take a look at the pricing, the blended rate of the input tokens as well as the output tokens, which do vary in cost, it is only 80 cents per million tokens for that blended rate. Whereas, if we compare that to Claude 3.7 Sonnet, which is $6 for their blended rate per million tokens and GPD4.5 at $93.80 for their blended rate. Now, in terms of performance, artificial analysis also has what they call their intelligence index. This ranks at the top across all other non-reasoning models. This outperforms GRO 3, Gemini 2.0 Pro, Sonnet 3.7, GPD40, so on and so forth. The only models that are outperforming it are the reasoning models. And if you're not as familiar with reasoning models, these models will think before they give you a response. So the trade-off with the reasoning models is you're going to get a better response, but you're going to have to wait longer for those responses. Now just to go over some reactions on X. Artificial analysis on X after performing their analysis mentioned that three months ago DeepSeek released V3 and we wrote that there is a new leader in open source AI noting that B3 came close to leading proprietary models from Anthropic and Google but did not surpass them. Today DeepSeek are not just releasing the best open source models. Deepseek are now driving the frontier of non-reasoning openweight models eclipsing all proprietary non-reasoning models including Gemini 2.0, 0 Claw 3.7 and Alama 3.370B. This release is arguably even more impressive than R1 and potentially indicates that R2 is going to be another significant leap forward. Now, in terms of the model, so the context window is 128,000. Now, mind you, it is limited to 64,000 if you're going directly through Deep Seek's first party API. In terms of the total parameters, this is not something that you can run at home unless you have something akin to a $10,000 Mac machine, which I'll touch on in just a moment. Another thing to note with the model is this is text only, and there is no multimodal inputs or outputs. So, in other words, you can't pass into the model things like audio files, videos, or images. And also, as you might expect, it won't generate those as well. In terms of a couple other reactions, here is someone actually running this on a 512 GB M3 Ultra. It's able to get 20 tokens per second. Now, this is running at 4bit, but you can see how fast it is on the right hand side here. With that being said, an M3 Ultra with 512 gigs of RAM. A similar reaction here is you can run this on a $10,000 Mac Studio. And just to really put this into perspective, it has an MIT license, so you can commercialize this. You can use it for inference APIs or different products. So, in terms of some other benchmarks, the Adar Polyglot benchmark. This is a benchmark that a lot of developers look towards and it's noted that it's significantly improved over the prior version and it's currently number two in the non-thinking models only behind Sonnet 3.7, noting that V3 is competitive with thinking models like R1 and 03 Mini. So, now a common theme that's coming up on X now is a lot of people have high hopes for Deepseek R2 just given how strong of an update this new V3 model is. And honestly, at the pace that Deepseek has caught up to the frontier, I wouldn't be surprised if Deepseek R2 when it comes out is not just the best open- source model, there's a chance that this could be the best model period in the world, at least until another model leaprogs it like we've seen. At time of recording, if you're looking to access the model, you'll be able to access this on Hyperbolic, Nebius, Fireworks, as well as Deep Infra. You can see the respective output speeds as well as the respective price, all with an artificial analysis. I did test this out on both Fireworks as well as the Deepseek API. If you go to deepseeek.com, you'll be able to access the API platform. Or alternatively, if you just want to try it from a chatgpt like interface, that's going to be the model that they're using within the interface right now. Now, if you do decide to use DeepSseek from their first party API, there are a number of nice things. Now, mind you, there is the trade-off. You will only be able to get up to 64,000 tokens of input. Nice thing with their API is they do have context caching. You'll be able to get rates as cheap as 7 cents with context caching and they even have a discounted price if you use this presumably on off hours where you'll be able to get 50 cents off and with a cash hit that's 3 12 cents to access this latest DeepS v3 model. I tested this endpoint from both fireworks as well as the deepseek first-party API within this artifacts project. And the one thing that I did find is the outputs especially for front-end development are very strong and in addition to that they actually specifically called out front-end web development where they have improved the executability of the code more aesthetically pleasing web pages and game frontends. I can definitely attest to the model strengths. I tested it on a number of different front-end tasks. The only other model that I think would have come close is Saw 3.5 or Sonnet 3.7. But otherwise, that's pretty much it for this video. Kudos to the team at Deepseek for their contributions to open source. But otherwise, that's it for this video. If you found this video useful, please comment, share, and subscribe. Otherwise, until the next

DeepSeek V3 0324 in 6 Minutes: Better than GPT 4.5 & Sonnet 3.7? - Developers Digest