DeepSeek V3: Did Open Source Just Surpass GPT-4o and Sonnet 3.5? - Developers Digest

About this video

Learn The Fundamentals Of Becoming An AI Engineer On Scrimba; https://scrimba.com/the-ai-engineer-path-c02v?via=developersdigest DeepSeek V3: The New Game-Changer in AI Models! In this video, we dive deep into DeepSeek V3, the latest groundbreaking AI model from DeepSeek. Released over the holidays, DeepSeek V3 has been turning heads with its impressive performance, often rivaling or even surpassing models like GPT-4.0 and Anthropic's Claude 3.5 Sonnet. We explore its standout features, including its high-quality score of 80 on the reputable Quality Index and how it stacks up against other top models in benchmarks like human eval, live code bench, and code forces. In addition to its competitive quality, DeepSeek V3 is significantly more cost-effective, priced at just 50 cents per million tokens, compared to $6 for Anthropic's Claude 3.5 Sonnet. With an output speed of 88 tokens per second and advanced coding and mathematical reasoning capabilities, DeepSeek V3 is more than ten times cheaper than its peers. We also discuss its training details, cost-efficiency, and the potential impact on the AI landscape, highlighting comments from industry experts. Lastly, don't miss our promotional shoutout to Scrimba, the interactive coding platform that offers a range of courses to further enhance your tech skills. Links: https://www.deepseek.com/ https://chat.deepseek.com/ https://artificialanalysis.ai/models/deepseek-v3/providers https://x.com/deepseek_ai/status/1872242663489188088/photo/1 https://x.com/deedydas/status/1872360259978924418 https://x.com/nrehiew_/status/1872318161883959485 https://x.com/mathemagic1an/status/1872434434110349434 https://x.com/giffmana/status/1872586401436627211 https://x.com/AravSrinivas/status/1872491784330432692 https://x.com/ArtificialAnlys/status/1872470388439130173 https://x.com/deepseek_ai/status/1872242657348710721 https://x.com/alexandr_wang/status/1872335968608669994 https://x.com/alexandr_wang/status/1872335968608669994 https://x.com/paulgauthier/status/1871919612000092632 00:00 Introduction to DeepSeek V3 00:18 Benchmark Performance Analysis 01:43 Quality and Cost Comparison 02:15 Speed and Practical Applications 05:18 Training Details and Innovations 05:58 Sponsorship Message 07:01 Conclusion and Final Thoughts

Transcript

--- type: transcript date: 2024-12-28 youtube_id: RvGmmjO3Tdg --- # Transcript: DeepSeek V3: Did Open Source Just Surpass GPT-4o and Sonnet 3.5? we have a brand new model from Deep seek so deep seek V3 is a model that a ton of people are really excited about this came out over the holidays now we really do have a model that does perform just as well as Claude 3.5 gbd 40 or in some cases actually outperforms it looks like on the coding benchmarks almost across the board there are some exceptions here like the software verified Benchmark as well as the AER edit Benchmark where it doesn't out form clae 3.5 but for benchmarks like human aval live codebench as well as code forces and a handful of others this does outperform even Sonet Sonet has long thought to be since it came out the preferred model especially for coding applications but now we have a model that I think a lot of people are going to be taking a close look at now if we take a look at artificial analysis this is a really great site benchmarking on all of the latest models that come out if we take a look at their Quality Index and I would say this is a pretty reputable metric that a lot of people look towards when these new models come out this score is an 80 so this is right up adjacent to Gemini 1.5 Pro as well as Claude 3.5 with their October release if we just take a look at this model we can see GPD 40 is down here at 73 whereas we have Claude 3.5 Sonet that was just released in October adjacent to this deep seek V3 model also scoring the same as the Gemini 1.5 pro model that was released in September another aside with this model it's not a reasoning model the only model that is outperforming this model that isn't a reasoning model is the Gemini 2.0 flash experimental model with that being said quality is just one piece of a number of deciding factors on why you would pick a particular model where this gets interesting is if we look at this model if you just compare this model at 50 cents per million tokens whereas Claude 3.5 Sonet where it ranked the same on the quality index that is $6 per million tokens and for gp40 that is $440 it's pretty wild to think that you can have a model this cheap that really performs just as well by the benchmarks as gepd 40 or claw 3.5 Sonet now if we look at the speed it falls right in line in the middle here speed is 88 tokens per second which is a respectable speed I would say it's definitely not the fastest it's also not the slowest by any means it falls right in the middle there so deep SE is more than an order of magnitude cheaper than some of its comparable models from these Frontier Labs that we have when you combine quality as well as price Deep seek is a very compelling model where I think depending on what we see released from anthropic and open AI in the New Year this could be potentially enough to see a lot of applications switch towards to try out this model you can go to chat. deep seek.com to try out their chat interface alternatively you can sign up for their API if you want to access their platform another great way to visualize the pricing is from this chart that they put out so you can see that deep seek in terms of the MML it is right at the top here that is a great way to visualize the pricing to Quality here we can see that the quality Benchmark is right up there with GPD 40 and Claw 3.5 Sonet but in terms of pricing this model is much closer to gbd4 mini than anything else another interesting Benchmark that a lot of people look towards when these new models come out is this AER Benchmark this is a practical Benchmark to see how well this model performs for real life tasks this out performs Claud as well as Gemini experimental with it only falling short to the 01 1217 model that just came out in terms of some other interesting commentaries Alexander Wang from scale AI mentioned that it is quite fitting that deep seek China's leading llm lab releases its latest model on Christmas on par with GPT 40 as well as Claude 3.5 sonid trained with 10x less Compu the bitter lesson of Chinese Tech they work well Americans rest and catch up cheaper faster and stronger here is another visualization comparing the deeps V3 model to some other open source models as well as gbd4 here's another visualization of some of the coding benchmarks here on the right as well as some of the other benchmarks here on the left like mlu Pro we can see on some of these benchmarks that this really outperforms in some particular areas such as math as well as coding there is a new leader in open source AI our independent benchmarks show that China based deep seeks V3 model ahead of all open weight models released to date beating open AI GPD 40 and approaching Claude 3.5 Sonet the October release the perimeter count of this model is 2.8 times larger than their previous deep seek 2.5 model they highlight that notably that deep seek V3 likely has particularly strong coding and mathematical reasoning capability with scores of 92% in human eval and 85% in Math 500 they mentioned that deep se's first-party API V3 is achieving output speeds of 89 tokens per second which is four times faster than their previous model which only had an output of 18 tokens per second all right now in terms of the training details so deep seek V3 was trained on 14.8 trillion tokens in just 2788 million Nvidia h800 GPU hours implying a cost of 5.6 million based on rental pricing of $2 per hour that's just 57 days on deep se's 248 800 cluster another interesting piece is that they used their reasoning model for distillation while reasoning models like open AI series may not be suitable for many use cases based on their cost and latency this is less of a barrier for generating training data they use the outputs from their reasoning model to generate data to train this model itself this video is brought to you by scrim but the Innovative coding platform that brings Interactive Learning To Life dive into a variety of courses from AI engineering to front-end python UI design and much more scrim is gamechanging feature is their unique scrim screencast format which lets you pause the lesson anytime and start directly editing the teachers code their curriculum is built in collaboration with industry leaders including mailla mdn hugging face and line chain and includes building application with open AI Cloud mistal models and guides you on deploying projects to platforms like Cloud flare while AI tools can assist with coding a solid grasp of the fundamentals is essential for achieving real experience scrimba offers something for everyone from complete beginners to Advanced developers and about 80% of scribus content is completely free sign up for a free account today using my link below and enjoy an extra 20% discount on their Pro plans when you're ready to upgrade I'm sure you'll love it finally to close it out they said that we assess the Deep seek V3 to be a highly significant release it reflects deep seek significant contribution to the open source AI Community as well as the continuation of the trend of Chinese AI Labs ascending to a clear Global second place behind the US the CEO of perplexities aravan shavas said that Chinese trained a model called Deep seek V3 that's better than open AI gp40 with a fraction of the budget and open sourced it meanwhile a bipartisan bill is in the works in America to impose severe restrictions on open sourcing Frontier models another interesting thing with this is that when some people were asking what model are you is that it would actually say I'm chat GPT a language model developed by open AI now I tried this before recording this video and I did also have it say that it was chat GPT but at time of recording if I try this out and I say what model are you and I send through that I get a response very quickly back that I'm deep seek V3 I no longer get that response this is something that I just read interesting perspective that's tied to that is from J hack here so deep seek V3 is an order of magnitude cheaper because it likely trained on Frontier Model outputs in obvious violation of terms of service terms of service laundering by training on deep seek outputs is impossible to prevent does not bode well for the economics of training Frontier models which is a really interesting point right because as you think about all of these models and the cost to train them and the hardware involved to train them I think gp40 was over aund million to train there might be a billion dollar training run or eventually A1 billion doll training run now the economics for something like that might not make sense right now but in the future if you're spending a billion dollars on training and you have some Labs that potentially will just take the outputs and distill a model and create their own it's an interesting question right an interesting response to this from technium is that the cop to Sayes okay for open AI to train on all terms of service violations and copy rated data in the world and claim it's legit while saying this is unfathomable an interesting thought because openai does have a number of lawsuits as well as what I presume a number of these Frontier Labs about the question of copyright and are you okay to go and scrape essentially the internet without having to pay to access that information that's an interesting debate with all of this as well so and one thing to note is the context window for the model is 128,000 tokens of context otherwise that's it for this video I'll link everything within the description of the video if you found this video useful please comment share and subscribe

DeepSeek V3: Did Open Source Just Surpass GPT-4o and Sonnet 3.5? - Developers Digest