
Exploring Google's Advanced Gemma 2 AI Models and Exciting Updates In this video, I delve into Google's newly released Gemma 2 AI models, including the 9 billion and 27 billion parameter versions. I highlight the impressive performance of the 27 billion parameter model, which outperforms other leading models like LLAMA 3 70B and Cloud 3 Sonnet in the Chatbot Arena ELO. I also discuss their availability on Hugging Face, Olama, and Google Cloud, including their compatibility and deployment options. Additionally, I cover the new 2.6 billion parameter model and various user-friendly ways to try these models through AI Studio. Finally, key updates and features of the Gemini 1.5 Pro model, including a 2 million token context window and free code execution, are explored in detail. Stay tuned for more advancements in AI technology! 00:00 Introduction to Google's Gemma 2 00:05 Performance of the 27 Billion Parameter Model 01:43 Accessing and Running Gemma 2 02:13 Technical Specifications and Benchmarks 03:19 Recent Developments and Announcements 05:23 Gemini 1.5 Pro Model and Code Execution 06:52 Conclusion and Final Thoughts
--- type: transcript date: 2024-06-30 youtube_id: dx4gk9lOUys --- # Transcript: Gemma 2 Aims for the Crown! Google's Latest Open-Weights 9B & 27B Models Google has released a Gemma 2 there's a 9 billion parameter as well as a 27 billion parameter model you're able to access it on hugging face or oama right now but the thing that is really interesting that I want to focus on first is the outsize performance of the 27 billion parameter model so one of the charts that I found interesting which isn't within the blog post here though they do show the metrics in the blog post here if you'd like to see them is how it scores on the chatbot Arena and essentially what this is when you put in a message what it will do is it will stream out two different responses from two different llms you don't know which llms are what and then the user simply chooses which llm response that they prefer within that setup we see that the Gemma 2 27b parameter model what's interesting with this is this even outperforms clae 3 Sonet in terms of preferred responses as well as llama 370b being able to run this model it's going to be less expensive to run while still having really good performance for the 99 billion parameter model if we take a closer look at that it does outperform both the mistol 7B model as well as the Llama 38b model this is a slightly bigger model than both of those but if you compare it across the board it is a wide margin that it outperforms both the mistra model as well as the Llama 3 Model on I wanted to point out is in terms of the human valow score which is something that a lot of people focus on with my channel since it is generally cating related is this is a pretty big jump from mrol as well as Gemma here the other thing that's interesting is there is going to be a 2.6 billion parameter model coming and as you might imagine it does have improved performance and quite a significant jump from the Gemma 1 model that was just released not too long ago the easiest way to access the model if you want to try this out is if you go over to AI studio. google.com and select it from the dropdown on the right hand side here right now they just have the 27 billion parameter model within here so if I just say hello world and I submit that you can see that relatively quickly you get your responses back and you can play around with it just like you would any other chatot so to run through the blog post a little bit so it is available on AMA right now if you have Ama installed you can go ahead and olama run Gemma 2 and then you can pull that down the 27 billion parameter model is able to run on a single Nvidia h100 tensor cord GPU so when you compare that to something like llama 70b you wouldn't be able to do that but with this model you're able to get that performance that is on par or exceeding Lama 70b on a single h100 they lay up benefits of this size of model and they mentioned that you can run this on a Google Cloud TPU or on an Nvidia a100 or an h100 GPU while still maintaining that high level of performance they do have some benchmarks within the blog post itself a couple other things that I do want to highlight because they are important is that Gemma 2 is available commercially they have this Gemma license which you can check out and get more details if you'd like which is always a question that a lot of people have for when these quote unquote open source models come out there's also broad framework compatibility and then starting next month Google Cloud customers will be able to easily deploy and manage Gemma on vertex AI so you can check out the Gemma cookbook if you'd like if you're a firsttime Google Cloud customer is you can apply for $300 worth of credits to play around with us another thing that's interesting is just recently Nvidia released this neotron 340 billion parameter model and Gemma 2 the 27 billion parameter model has formed that model so even something with a ton of different parameters which would be relatively expensive to run you can now get on Just a mere 27 billion parameters you can also access this on fireworks AI if you'd like if you are a fireworks user you can access it on there they have a really great API what's interesting with this model is artificial analysis have laid out which I encourage you to follow if you don't follow them they point out is that Gemma 2 excels within the chatbot Arena but legs across a number of different evaluations it indicates that it has a really good strength in communication but lesser within the reasoning abilities when compared to llama 370b this is likely because of the new reward model really focused on multi-turn conversational abilities because gemu is quite small it will be very fast that's another consideration likely twice the output speed when compared to some of the other models like llama 370b now one thing thing with the model is it only has 8,000 tokens of context that you can pass in that I think is going to be a little bit of a sticking point for some people but there are some methods where you can extend that context length if you'd like I'd imagine if they don't already exist they'll probably exist in just a number of days on hugging face where you'll be able to use one of these models at a larger context window if you'd like and this is one of the first models that's set up to be rewarded for those types of conversations that you have with a chatbot we have all of these different metrics like the mlu or the human AAL and what have you but what we haven't really seen before is instead of getting those higher numbers on MML or human aval is actually to have these models that are trained for these multi-turn conversations like you see laid out in their paper here check it out on AI Studio try it out on AMA there's a ton of different ways that you can access this but the other thing that I wanted to point out is that they did come out with some other exciting announcements yesterday the Gemini 1.5 pro model is now available to everyone with 2 million tokens of context which which is funny and a start comparison so their open source model they have this really small token window which is interesting because you have these smaller models that they have open source with these smaller context Windows all the while they have a 2 million token context window for their Flagship Gemini pro model another thing that they put out is now they have code execution available directly within Gemini 1.5 and it is completely free you're able to actually execute python code if you're using their developers API and you don't need to incur additional costs for using that code execution tool you'll still be built for the input tokens and the output tokens like you typically would but you don't actually have to pay a premium for leveraging this code execution feature which is a really great benefit and offering they've also started to roll out fine tuning for Gemini 1.5 flash which at time of inference from what I understand it is going to be the same cost as just using Gemini one I just wanted to highlight the code execution piece and I found this interesting example of someone leveraging this within Google AI Studio already and what they're able to do is they're able to download a data set and what they tried with Gemini 1.5 Pro with the code execution is that they actually asked the model to download a data set and then it was able to download the data set and create the data that they had requested within the format that they wanted all within the interface here that's it for this video if you found this video useful please like comment share and subscribe otherwise until the next one
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.