
Exploring Google's New Gemini 2.5 Flash In this video, we dive into Google's newly released Gemini 2.5 Flash model, a cost-effective and flexible thinking model. Key features include multimodal input capabilities, a vast contact window of up to a million tokens, and fine-grain control over the thinking process. We discuss its competitive pricing and impressive performance on various benchmarks. You can experiment with the model on Vertex AI or AI Studio. Tune in to find out more about the revolutionary advancements and potent capabilities of Gemini 2.5 Flash! 00:00 Introduction to Gemini 2.5 Flash 00:04 Key Features and Flexibility 00:18 Thinking Strategies and Control 01:23 Pricing and Cost Efficiency 01:38 Benchmark Performance 03:09 Multimodal Capabilities 03:35 Access and Usage 05:26 Demonstration and Final Thoughts
--- type: transcript date: 2025-04-17 youtube_id: EEDjm7VNrV4 --- # Transcript: Gemini 2.5 Flash in 6 Minutes Google has just released Gemini 2.5 Flash, their latest cost effective thinking models. Now, what's really great with these models, their price point, but also its flexibility. You're going to be able to pass in not just text, but also audio, images, as well as videos. And for your context window, you're going to be able to pass up to a million tokens of context. So, an interesting aspect of these models is that they're calibrated for quote unquote thinking strategies across diverse scenarios, leading to more accurate and relevant outputs. There's going to be different approaches on how it's going to think through those potential scenarios. Now, the other great thing with this is you're going to have fine grain control over the model's thinking process. You're going to be allowed to manage the resources. Just to demonstrate that, if you want to try this out, you're going to be able to enable or disable thinking mode. And further, you can go and set your thinking budget. You'll be able to set how many tokens you want the model to reason for. For instance, if you want to dial it all the way up, you're going to be able to pass into almost 25,000 tokens. Or alternatively, if you just wanted to think for a very brief amount of tokens, you can set that as well. AI studio.google.com is an option to try this out. I'll also put all of the links to what I'm showing you within the description of the video. And then finally, they mentioned that when no thinking budget is set, the model is still able to assess the complexities of a task and it will calibrate the amount of thinking accordingly. You don't need to explicitly set that thinking budget as a potential option. One thing that really stands out with the model is its price. It's going to be 15 cents per million tokens of input, 60 per million tokens of output, and $3.50 for the reasoning tokens. So, in terms of the blended rate for the model, this model is going to be amongst one of the cheapest models. But what's really impressive with the model is if we look at some of the benchmarks for humanity's last exam, this model is just shy of 04 mini, which just came out yesterday. And when we compare that to Claude 3.7 with thinking, it even outperforms that. And on GPQA diamond has similar results. This model across flash 04 mini sonnet 3.7 gro 3 beta as well as R1. We see this model scores better than all of them. Now if we take a look at some of the pricing what's really stark here is for Sonnet for instance it is $3 per million tokens of output $15 per million tokens of output and a similar story for Gro 3 beta as well. This model in terms of price to performance is honestly quite incredible. And if I look at the AME benchmark for instance for 2025 as well as 2024, what's really amazing is you can see the jump from 27.5 for the previous generation of Gemini Flash to 78% and for 2024 32 to 88%. While it doesn't outperform on a number of metrics, just remember that this is amongst one of the cheapest options given the capabilities and for Live Codebench, it doesn't quite reach the capabilities of Gro 3 or R1. Presumably, it's also the case to not quite reach the capabilities of O4 as well as Sonnet 3.7 as well. But with that being said, in terms of the ADR Polyglot benchmark, it doesn't quite outperform some of those other models. It does look like it is particularly strong in areas like mathematics. It does look like you are going to be able to get some good generations with code, but it might not be quite as capable as some of the other metrics that are out there. In terms of the multimodal capabilities, we see that this is a very competitive model. So, OpenAI just released O4 Mini yesterday and we see this number is just shy. We can see it basically outperform all of the other benchmarks. We can see that this is a very effective model in terms of the multimodality. Given the price, this model basically outperforms everything with the exception of O4 Mini that just came out yesterday. For long context, it is a significantly from the previous generation as well. Now, in terms of being able to access the model, you're going to be able to access this on Vertex AI right now. Just to give you an idea in terms of how much you can put within that million tokens of context. It's about 45 minutes of video with audio that you can pass into the model if you want things like timestamps or to get certain segments from a particular clip. You're going to be able to pass in up to 3,000 images or documents. Now, obviously that's going to vary depending on the size and how much context is in each of those images or documents, but it is a very flexible model. Now, in terms of audio, you're going to be able to pass in 8 and 1/2 hours of audio as well. If you're looking to do analysis or summary, basically across the board, regardless of what you want to pass in, it's going to be a super flexible model. Now, in terms of being able to access the model, you'll be able to get it at gemini.google.com. You can go ahead and try it out even if you don't have the Gemini advanced paid tiers. Just to demonstrate Gemini, I can say something like, "Tell me about Google's release of Gemini 2.5 Flash today." And the great thing with the model is it does have tool calls available. You can go and you can look at the thinking trace and it will go and it will search through relevant information to inform the context that it needs to answer your question. You have the ability to see the different references all throughout the different pieces within the response here. Additionally, arguably one of the easiest places to get an API key is from AI Studio. The great thing with this is they do also have a generous free tier. If you just want to try this out with a hobby app, you can go ahead and grab an API key and try all of this out. for instance, if you want to set your thinking budget and enable Google search for instance. And then from here, you can go and grab the respective code snippet depending on the configuration that you've set within the playground. Finally, just a really quick demonstration of the model. So this is Gemini 2.5 flash within an application that I build and we can see it go through some various web tasks. If I ask it to build a SAS platform, we can see those results. Now, one thing to with the model is it is quite fast and what is really nice with this is having that ability to control that level of thinking as well, which I think a lot of developers will appreciate as well. But otherwise, that's pretty much it for this video. If you found this video useful, please comment, share, and subscribe. Otherwise, till the next
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.