
In this video I show you LLAMA 2, Meta AI's newest open source model that can be used for commercial use (so long as you have less than 700 million active users). In this video I show you a host of different options you can choose from to get running LLAMA 2 locally, to using it from an API on Replicate, or trying it with Vercel, HuggingChat or Perplexity AI! Links: https://ai.meta.com/llama/ https://github.com/oobabooga/text-generation-webui https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML https://replicate.com/replicate/llama70b-v2-chat https://replicate.com/docs/guides/fine-tune-a-language-model https://huggingface.co/blog/llama2 https://huggingface.co/chat/ https://sdk.vercel.ai/ https://llama.perplexity.ai/
--- type: transcript date: 2023-07-20 youtube_id: At-SmW1uxZw --- # Transcript: LLAMA 2 - Get Started With Meta's Newest Open Source ChatGPT Contender all right in this video I'm going to show you a host of different options that you can use to get started with llama 2 so llama 2 is a host of models that was recently released by meta that allows you to now use their large language model for commercial use so prior to llama 2 you were restricted in being able to actually commercialize a product out of their language model so the big thing with this is so long as you have and the magic number is less than 700 million users you'll be able to use this for your application so I think I can speak for all of us where you know I think that's most of us if you're watching this video that will be able to use this and build something that we can actually use for commercial purposes which is awesome so one thing to note with llama and I think part of why the reason why so many people are excited about this is because it's open source and I just want to touch on that just for a moment so developers myself included love when things are open- source so if you just think about it for a moment there's probably a host of different open- Source libraries that you're using or have used so it could be anything from you know Linux running on your Android phone to the projects that you're actively developing and being able to you know npm install all those different dependencies so being able to open- Source this and have a community of Developers and Engineers work and fine-tune and refine and create different versions of it is huge and this is by far if we just look at some of the benchmarks the best open- source model in a host of different areas so I'm not going to touch on all these different benchmarks but depending on your use case um just look through these benchmarks you know a part of me wish uh these benchmarks were a little clearer in terms of what they're actually measuring um but if you read through these you might already be familiar with some of these in terms of what they're measuring but you can see here within this column here how it Stacks up to a couple other popular open source models like the MPT model as well as Falcon and then you can see how it uh has compared um in uh to llama 1 in in you know where it's comparing the 70b model here so the first thing I'm going to show you is if you want to pull this down and set it up locally the tool that I found to do this is this text generation web UI so the easiest way to set this up is if you go to the GitHub repo and you just scroll down here if you go to their oneclick installers that was the approach I took if you want to go through and you know walk through you know how to set it up a bit more manually you can do that as well but I found I downloaded the installer and it just runs so you can go and execute that uh shell script and it will run through downloading everything that you need to get going so once you have that downloaded you can go ahead and fire it up you'll see it locally here and the nice thing with this is you'll be able to download custom models here so I don't actually have it downloaded here um right yet but if you wanted to you can just go over to hugging face and I'll just uh go back here just for a moment because if you want to use the model directly from meta you do have to request access so you can see I've requested access for this I still don't have access but you can go and find uh some other options to be able to use this locally so you see there's this user the bloke here where there is uh a model where you can pull this down so I tested this just before this video where I copied this put it within here and it started to download so one thing just to note with this um so it does give you the option to run on a GPU or your CPU so depending on your Hardware you can spe specify it while you're running through that install it will prompt you in the terminal with uh which option you'd like so just a heads up on that and then for the actual uh files that you'll need so if you're running low on storage space like I am um which is why I'm not demonstrating this you can see uh the amount of storage space that you need to run it so you can look through uh some of the different uh options to use it as well if you'd like um but as you see here this is the text generation web UI that I showed you just a moment ago so I'll have links to everything in the description of this video if you're curious to use any of these uh different um repos or services so next I'm going to show you the option from replicate so replicate recently uh released this pretty soon after it came out and if you're not familiar with replicate it's a very easy way for python node.js developers to easily access models so there's a whole host of models on their on their platform from uh text to image models like stable diffusion or large language models like this so this gives you a nice little playground that you can play around with you can change you know the max length temperature stuff like that similar to the open AI API playground and you can just go ahead and submit query so you don't have to go in and even sign in you see I'm not even signed in here and you can see okay it's it's giving uh uh it a prompt and it's running through so you see it's it's not super fast or Snappy necessarily but it sort of gives you an idea if you want to play around with some of these more you know uh um nuanced op options for for playing with the model so also with replicate it makes it really easy to use for like I mentioned no JS or python developers so you can go ahead and install their package you can grab your your API key and then you can be essentially Off to the Races so it works very similar to the hugging phase uh nodejs wrapper where you can just reach for that model if it's available for inference so essentially like if you uh have access to use it as if it was like an API um so you can go ahead and use this if you'd like so there is are some pricing options on using their API I haven't run into the limits of you know when it actually bills me but uh if you'd like to play around with replicate that is an option for you and you can check out pricing it's pretty well laid out so one thing to note with pricing is that there's a difference between the uh fine-tuning options and training a model and then actually inference so when you're looking through here just be mindful of that so next uh another thing with replicate while I'm on uh their services is they just open- sourced a framework to tune the model and the thing that I found interesting with this is it gives a bit of an example on you know if you'd like to further fine-tune a model and some of the cost that it might take and the amount of Hardware so you can read through this if you'd like to take something like the L llama model and further fine-tune it if you already have an idea on how you want to use it for your particular use case so really interesting I haven't actually used this um but you can see you know if you have you know a product that you want to ship and you need a fine-tune model that's open source this is probably a good option to consider okay so next hugging face so hugging face does have a sort of chat GPT like competitor that's sort of like you know Bard or the clae 2 anthropic model where it has a little uh web goey that you can just go to the site and it just starts to work so the thing with their hugging face chat is it has the ability to search the web which is really nice so if you'd like to tie in this model with that functionality switch that on and then you can ask a question like a common one that people use is like what is Lang chain so Lang chain if you put that in something like chat GPT it might hallucinate and try and say that this is a some blockchain related thing when it's it's not right it's something used for you know it's like an llm framework you can think of it as so you can see here it also gives you the breakdown of what it's doing and you can see okay it's saying what is LinkedIn rather than what is Lang chain so not exactly what I want but let's just try it one more time let's say what oh and it's asking me to sign in so I'm not actually going to sign in here uh that wasn't the best example of it but it gives you a really nice uh user interface if you like to play around with it so really easy I'd encourage you to try it both with searching the web let me know the results in the comments if you had better luck than what I just demonstrated or uh if there's uh if you find it's not working well so try both let me know either way but very nice interface and it also gives you that chat GPT like experience where it saves your Chats on the left hand side there okay so next is the SDK ver. site which is awesome so I I love what versell is doing just across the board in terms of development their platform they do sometimes get criticism on it being some sometimes expensive but considering what it can do it's amazing for what you get um so the thing with uh versel uh so they released this about a month ago or so is it gives you the ability to compare across different models so if I wanted to go ahead and select you know Lama 13B let's say Lama 13B and Lama 7B V2 and then compare it to uh GPD 3.5 turbo I can say uh tell me a short story so just something broad you can see okay GPT 3.5s outputting right away streaming those responses you can see the 13B model it's taking a little bit longer and it wants to give me a sort of a short response here and just sort of looking at all these different options here but let's say tell me a scary story and you see it's hanging here which is interesting so okay so that didn't look like it was streaming the responses um but you can see that it does give me a story so I'm not going to read through this but this is just sort of a great option for being able to compare different models if you'd like to do that and then finally now this is the one um that I found is the best for inference so this one is really really fast so I haven't actually seen uh an inference model web UI that has this fast of a output so if I just show you let's just go with their small model let's just say write me a short story so you can see here it's outputting incredibly fast and it even gives you the speed so perplexity uh is sort of interesting to think about what they're doing behind the scenes it's probably a combination of both uh putting more Hardware but then also it seems like on the inference side that there might be some uh breakthroughs potentially that they have to make it as fast as it is so if I just say let's make this story longer so you can see it's really really fast so it's similar to uh GPD 3.5 uh by the looks of it but when I was using this on my phone it was like almost to the point where that streaming text response it was like too fast I'd almost like want like you know a couple sentences couple sentences and then just have it sort of load in the background I I don't know exactly the UI to resolve that but uh it can feel a little overwhelming just like you know it it's streaming out all those responses so it'll be interesting to see like that you know once these models get really really fast and it you know say it can just output this whole blurb of text in one shot um what that UI will look like uh you know will it sort of I don't know load a sentence at a time or because you know like as humans we can only read so fast so um just sort of a side thought so hopefully you found this useful there's a handful of options to get started there is a 70 plus page white paper that you can read on the nuances of uh the Llama models how they were trained um and how they were fine-tuned and you know all the different things that they've done uh for safety and whatnot there's also a handful of videos on YouTube that you can check out that have great resources there's a lot of other YouTubers doing amazing work with uh covering this sort of stuff but I wanted to do a different take and just sort of show you what's out there so if you found this video useful please like comment share and subscribe and otherwise until the next one
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.