Google's Gemma Open-Source Models - Developers Digest

Transcript

--- type: transcript date: 2024-02-22 youtube_id: Jm1BlI6Y5VI --- # Transcript: Google's Gemma Open-Source Models in this video I'm going to be showing you the new open source models by Google there are two models and they're built on the same research and technology that were used to create Gemini which are their Flagship models in terms of the intention of these models as you see here the 2B model is really intended for mobile devices and laptops and the 7B model is intended for desktop computers and small servers so if we take a quick look at the Benchmark MML score is the general of knowledge of that language model so for the Gemini Ultra model when it came out the MML U score that it touted was better than gp4 now that caught some flak and similarly with this model it is catching a little bit of criticism so when Google puts out a model one thing to note is that this is marketing material so on their blog post here the chart that they decided to put out was a chart where it looks like Gemma is outperforming all of the different metrics across the board but there is a much better better way to look at all this if you actually check out their technical report which I'll link in the description of the video they do pull up both latu and mistl in the comparisons to the different models and on all the different metrics G does perform very well across many metrics if we look at the far right hand column here the Gemma 7B variant does look to be the most performant across most metrics here now with that being said there are handful of metrics within this chart where mril and respectively llama 2 do outperform so Gemma might not be a drop in replacement necessarily for mistl or llama depending on the task now with that being said I definitely encourage you just to try out all of these different models and if you have something running in one of your applications say if you're using something locally like mrl 7B or if you're using that on an application that you might have just testing it maybe doing a bit of a canary test on seeing how well something like Gemma performs some of the same tasks so maybe not just swap it out all at once but just get a sense on how it performs so one of the things I found interesting within the technical report is that the 7B model is actually closer to 8 billion parameters so while the comparisons do show that it's 7B models it is awfully close to a billion parameter just something to take with a grain of salt with all of this now on this video I wanted to show you also how to get started with it there's a number of different ways on how you can get started with a Gemma right now Google referenced a few different examples I'm going to be showing you how to get setup on your local machine as well as a couple other options that weren't mentioned if you head over to either kegle vertex AI or hugging phas you can get started kegle you can just sign up for an account get started there similarly on the Google Cloud platform if you haven't signed up before you'll be able to get I believe it's $300 worth of free credits where you can go ahead play around with the different models now the thing with the Google Cloud platform if you go within their model Garden you're not just going to be limited to using something like Gemma you could use things like the Gemini Pro Vision model or the Gemini pro model and interact with all sorts of different models including other open source models as well so that's a good option if you want to play around with a bunch of different models as well as just in general Google cloud and their platform there's a ton of different services within there now the next platform you can check out is obviously because it's an open source model it is on hugging face so you can go ahead and check out the model card on hugging face you can see all of the different details on how it works and one thing to note that I don't think I mentioned is there's two model sizes available one is pre-trained and then there's also one instruction variant so it's really interesting to start to see other players enter the open source Arena obviously we saw meta come in with their L 2 models last year and that really garnered a lot of support from developers and the open source community and meta really got a lot of good attention from doing so it looks like Google is taking a similar approach to what meta had done last year now another thing to note is if we just head back again to the technical report the really interesting thing with all of this is mistol the company behind mistol from what I understand it's only about 30 people and when you compare that to something like meta or Google these are companies with tens of thousands of people working for them so it just goes to show you how impress of the mystal AI team and definitely is a team to keep an eye on with all all of these different models coming out so if you're interested in some of the technicals of the model I'm going to link out to this paper here in the description of the video where you can take a look it's not a super long read it's 16 Pages it's very short it's very concise and you can get an overall look on how the model itself was built so if you're interested in the model architecture how it was trained and all of that this is a good paper to check out from the Google Deep Mind team here hugging face is an option where you can go ahead and check out a ton of different information about the model there's also a community tab here you can go ahead and discuss the model or if there's certain issues that you might be having you can go ahead and talk about them here another option you have is hugging chat so you can head over to hugging chat and then you can select the model that you want to use here before you start the conversation there's a number of different models that you can interact with so if you choose the Gemma model here like I will and I just say hello world you'll see that it has like a chat GPT feel and gooey on how you can interact with it you can say write me a node.js application and you can see it has that nice little markdown editor where you can copy the bits of code from it as well so it's a nice little interface where you can play around with the different models here so similar to this interface is perplexity labs and the nice thing I like about perplexity Labs is the inference speed is really fast as well as some of their close Source models that allows you to interact with an llm that does have access to the internet so if I go ahead and say write me a short story you can see here from the response that it was very fast in how it responded to me so it responded at over 200 tokens per second now obviously this is a smaller model then if I try on the 7B model and I say write me a story so while it's not as fast obviously as the two billion perimeter model it's still clocking in at over a 100 tokens per second so a really great option to interact with this model if you'd like so while perplexity has a really well-known application that you can use that works really well they also have a really great API that they recently released in the fall so while this model isn't quite yet in the documentation for the API I would be surprised if you go ahead and try and query the model that it would work and if it's not here at time of recording I wouldn't be surprised if the model shows up here in short order the last option I wanted to show you is AMA which is a really easy way to get started with running these large language model models on your computer you can go ahead download this you can go over to the models you can pick the different models that you want to interact with so in this case if we click Gemma all that you need to do once you have Ama installed is you can go ahead within your terminal and run AMA run the 2B variant or AMA Ron Gemma the 7B variant so you have the option to run either of those right from your computer and the other thing that's nice with olama is there's widespread integration for Ama across a ton of different platforms so you can use this in things like Lang chain you can use it on llama index and it's really easy to set up so essentially what AMA does it will set up a local inference server where you can go ahead and query different models so if you have llama 2 installed or if you have Gemma installed you can go ahead and with this simple SDK wrapper you can begin to query it and integrated into your applications so say if you have an application if you want to leverage all of the compute that you have just on your laptop or your home PC or whatever you can do that all from here now the one thing to not for the Gemma model itself if you already have Ama installed and you try and pull down the Gemma model and it doesn't work you'll have to get started with the 0.1.2 six version so if you head over to GitHub I'll put all the links for this within the description of the video you can just pull down the correct version for your operating system install that and then you'll be able to pull down the model and run it with the newest version here just to show you how AMA works so once you have it installed you can just go within your terminal you can simply type AMA run Gemma or you can specify the different model sizes you want explicitly so if you just run Gemma that's going to default to the 2B variant and then once you're in there you can say tell me about our solar system and then it's starting to go and tell us about the solar system so a little bit about my machine so I have a newer Macbook so I get pretty good local inference speed but I also have older Hardware as and it does perform pretty well across pretty much all the different Hardware that I've thrown it so I haven't really been too disappointed with the model speed unless I'm running a bigger model like a 34b model or attempting to run something like a 70b model I haven't run into any issues when trying to run inference locally with these models that are smaller than 7B or even 13B on most of the machines that I have another thing I was playing around with earlier is asking it to return so if I say always return responses in Json with the key message and then if I go prom tell me about our solar system so just to remind you this is a really small model this was something that for a really small model actually impressed me so one of the things that I found is earlier last year when I was playing around with GPD 3.5 is trying to get responses like this consistently were often pretty difficult and when I was playing around with this I got a decent hit rate for getting consistent responses now it didn't have a 100% hit rate sometimes it would return a string or it would say something like I can't answer that message but just to be able to play around with these models and being able to potentially leverage them for Parson queries and having them return Json to you are going to be a really powerful use case for these smaller models if they can work consistently or consistently enough I should say so that's pretty much it for this video I just wanted to give you a brief overview of the new Gemma open source models let me know what you think of the Gemma models I want to hear The Good the Bad and the Ugly I'd be curious to hear all your thoughts within the comments of this video that's it for this one if you found this video useful like comment subscribe until the next one

Google's Gemma Open-Source Models - Developers Digest