
In the latest video, we discuss Google’s recent advancements in the AI field, particularly focusing on the release of Gemini 1.5 Pro and Gemma 2 models. We examine their impressive performance on the LMSYS Chatbot Arena leaderboard, noting how they surpass competitors like GPT-3.5 and Lama 2 70B based on human feedback. An overview of the Chatbot Arena evaluation process and its emphasis on user-preferred responses is provided. Additional insights into the locally operable Gemma 2 2B model, the Web LLM project, and the utility of Google’s Gemini models for different applications, including coding, are also discussed. Highlight is given to the innovation in consumer-friendly AI models and their potential use cases. 00:00 Google's Big Week: Gemini 1.5 Pro and Gemini 2 Release 00:24 Understanding the Chatbot Arena Leaderboard 02:04 Exploring Gemma 2 2B: Versatility and Performance 02:49 Web LLM: Running Models Locally 04:25 Evaluating Gemini Models for Coding Tasks 05:43 How to Try Out Gemini 1.5 Pro 06:29 The Competitive Landscape of AI Models 07:13 Conclusion and Viewer Engagement
--- type: transcript date: 2024-08-01 youtube_id: nwXzIFfCgQc --- # Transcript: Did Gemini 1.5 Pro Just Beat GPT-4o? Google has had a pretty big past couple days so in the past day they've released both Gemini 1.5 Pro experimental which is topping the lmis chatbot Arena leaderboard and then we also have Gemma 2 that came out and the impressive thing with Gemma 2 is it's even outperforming mixol gbt 3.5 as well as Lama 270b on this same leaderboard so what's interesting with this evaluation is this is based on human feedback if you haven't used the chatbot Arena what it will do is if you just put in a query here we'll put in a simple one we'll just say hello world you submit that what it will do is it will give you two different model responses side by side and then once they're complete you can choose which model you prefer so if you see within the example here we have a simple response to hello world and then on the right hand side here we have a more involved one you can just go and choose which one you think is better in this case let's just say say I prefer a shorter response in this case we see that the preferred response is from Claude 3.5 Sonet that's what's really interesting with this leaderboard so we're not looking to try to optimize on something like human eval or the mlu sort of General capabilities of the model it's just looking for what do individuals prefer as a response top spots before today was held by gp40 followed by GPT 40 mini and then finally son it what's interesting with this is this doesn't necessarily mean that Gemini w 1.5 Pro is going to be the best model for answering questions on coding or in the particular domain that you might be interested in applying it to it's just the general overall review on what responses are preferred in terms of the outcome if you haven't checked this out it's arena. lm.org and there's a ton of really great information here so you can look through it I believe it's coming out of UC Berkeley and then they also have a number of sponsors that actually provide inference for being able to support this and get all of the different benchmarks that we see here another thing that I wanted to touch on that came out this week is Gemma 22b I'm just going to show you this model so now what's great with this model is given the size of it is it runs really well even on your consumer Hardware this could run in the browser this could run on your phone this could run on your computer it could run on hosted providers it can run just about everywhere and just to give you a demonstration on how it works so if I just say write five paragraphs on the solar Eclipse let's say and we'll see there that we get the response really fast streaming back and this is completely local on my machine so this sort of opens up a ton of use cases because all of a sudden we're able to use this model in the browser we're able to use it on our PC and mind you it's going to also be very cheap to run so now another thing that I wanted to point out is there's this really cool project called Web llm which I encourage you to check out if you want to try it out they do have a version where you can just go and play around with it and then you can select the model that you want to use let's say I want to use this new model from Gemma you can go ahead and find the 2B version they also have different quantized versions within here that you can select and you can go ahead and select that and say hello world and what will happen is this will essentially load that model directly within the browser it does take just a few seconds to load up and download the model mind you I have a relatively quick internet connection but once it's loaded you're going to be able to have a conversation completely within your web browser it's running again locally completely on your system just within a different environment if I just again say hello world now that it's all loaded we can see that just like on the AMA example now I'm getting a relatively quick response back here what's interesting with this is we do have apple Intelligence coming out which are locally running models that are going to be on the Mac OS as well as iPhones and what that is essentially a handful of these models that are fine-tuned for particular tasks so this is an open source equivalent where you could do something akin to that if you'd like now mind you it's obviously going to be harder within the Apple ecosystem considering how closed down the actual devices are but if you wanted to play around with this on your local machine potentially fine-tune it for particular use cases it's going to be a really great model to do so now with the Gemma series of models well it does rank high on that chatbot Arena it won't necessarily rank or perform as well necessarily when compared for say coding task to something like Mixr or GPD 3.5 turbo that was some of the initial skepticism that I saw with this model when it came out even though you do have the preferred responses within the chatbot Arena just because it ranks high on this leaderboard doesn't necessarily mean that it's good at coding or if it has a general capability or a high MML necessarily it could just be that these are the overall preferred responses that people want to have from an llm and it's an interesting question right because as you saw in that first example that I had there do you want a model that's going to give you a very verbose answer now it looks like it's an accurate answer it's giving me all of the different ways that you can have Hello World in different programming languages but in this case I just wanted a simple response now if I started to ask more coding related questions I'd imagine that Claude 3.5 Sonet would often be the model that would prefer that's just from my experience using all of these different models that seems to be the model that I gravitate towards in terms of actually adhering to what I'm asking for when I'm using it in a coding context to now to try out Gemini 1.5 Pro you can head on over to aist studio. goole.com and they have it hosted there where you can try it out for free and this is also going to be where you get an API key as well you can obviously just click and get API key there walk through the steps and grab your API key now the other thing with this model is it has the biggest context window this allows you to pass in 2 million tokens of context you'll definitely have to get very creative on how you're going to leverage 2 million tokens of context just to give you an idea I couldn't even get a million tokens when I passed in an entire S1 document which is all of this information so even with all of the HTML included I couldn't even come close to that million tokens of context I encourage you check it out there's all of the different models on here from Google it's really great that now we have a number of different organizations that are all ranking and leap frogging one another week after week on these different evaluation metrics and these different leaderboards it's not just open AI anymore we have Google we have anthropic we even have meta open source 400b parameter model there's just a ton of different options to choose from and at the end of the day it's really the developers as well as the consumers that win by being able to have a of different options so kudos to Google over the past week and what they put out it's really impressive they've moved sort of the frontier on these really small models as well as they're leading on the chatbot Arena leaderboard if I'd be curious let me know in the comments how long do you think Google is going to hold the top spot here within the chatbot Arena do you think it's going to be one week two week 3 weeks let me know in the comments below but that's it for this video if you found this video useful please like comment share and subscribe otherwise until the next one
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.