
Exploring LM Studio: A Guide to Running AI Models Locally in 7 Minutes This video tutorial introduces LM Studio, a comprehensive application that allows users to run a variety of AI models locally, including Llama 3, Falcon, and Mistral models. The guide covers how to download and install LM Studio based on the user's machine, navigate the interface to search for and download desired models, and use the AI chat interface and playground. I demonstrates downloading models, including Lama 3 and Lava models, and showcases how to utilize these for coding and multimodal sessions. Moreover, LM Studio's capability to function as a local inference server is highlighted, providing users the ability to incorporate local models into their applications without needing external accounts like a credit AI, AWS, or GCP account. The tutorial also addresses system limitations when running multiple models and the opportunity to expand accessibility to powerful, open-source AI models. 00:00 Introduction to LM Studio: A Powerful Local Model Application 00:13 Getting Started with LM Studio 00:44 Exploring the LM Studio Interface 01:33 Diving Into AI Chat and Model Selection 02:00 Technical Specs and Coding with LM Studio 02:50 Utilizing Multimodal Models and Local Inference 05:13 Maximizing Model Performance and System Limitations 06:22 Conclusion and Alternatives to LM Studio
--- type: transcript date: 2024-05-20 youtube_id: SUeIsSML2UY --- # Transcript: LM Studio: Run Local LLMs in 7 Minutes in this video I'm going to be showing you LM Studio which is an incredibly powerful application that you can run a ton of different models locally from llama 3 to the FI models to the Falcon models mistro models the list just goes on when it get started you could head on over to Elm Studio and then you can select the machine that you're currently running on once you have it installed you will have a screen just like this it's as easy as selecting one of these models that you see on the dashboard here you can search for the model within the search bar here or you can even paste in a hugging face repo URL if we want to get started with the Llama 38b instruct model we can just simply go ahead and download it so I have a relatively quick internet connection this model even though it's about 5 gigs should download relatively quickly so while we're waiting for the model just to wrap up I just want to do a little bit of a highle overview of the actual interface so on the left hand side here you have the search interface where you can go ahead and search for different models say if I want to search for llama based models you can go ahead search for llama based models and it will show all of these different models that are available for you to download it's also nice to have the number of times that it's downloaded as well as the number of times that it's been harded within the interface you have a general idea on how popular these various models are there's also a fully featured AI chat interface which I'll go into in just a moment there is also a playground as well the other nice thing with LM Studios is you're not just limited to text based llm models you can also use other multimodal models such as lava and then the last item within the sidebar here are the models that you have locally here now that we have this downloaded if we just go over to the AI chat interface we can go ahead and get started once you have a model all installed you can just go ahead on the top of the screen here and select the model that you have downloaded we can go ahead and select the new system prompt here here we just have a default one that says you are a helpful smart kind and efficient AI assistant you always fulfill the user's request to the best of your ability in terms of the specs on my machine I have an M3 MacBook Pro model with 18 GB of RAM one thing I wanted to point out is say if you're using this for coding and you say generate a nextjs boiler plate now the one thing you'll notice right off the bat is when it starts to return to you is it doesn't actually interpret this markdown but all you have to do to actually see that with a nice syntax highlighting like we're used to with these chat GPT like interfaces you can just click that markdown button and then all of a sudden you have it all nice within the styling and syntax highlighting and all of that stuff there is also a monospace option as well if you'd like so you have a few different options to play around with definitely for coding use cases I'd encourage you to use markdown but otherwise PL text just might do if you're using it just for interacting with questions that you might have and what on if I just go ahead and search for lava here I can go ahead and download the lava model here you can see all of the different specs here if I want to try out the lava 53 Mini model I see it's only about 2 gigs so it might be an interesting model to explore and as you can see the interface is really just intuitive you can just download it and start to play around with it the other great thing with llm Studio is that it does have this local inference server that you can run while there is this nice interface that you can interact with if you want to try out all of these different local models and just play with it within the interface let's say you have an application and you want to see how that application runs leveraging those local models that you have downloaded you can just go ahead and use this as a local endpoint you can go ahead and make requests to the Local Host and the chat completion at least in this example but you can also use it for different things so if you want to use it for embeddings you can use an embeddings model if you want to use a chat completions model you can use that the thing that's nice with this is there's a ton of people that are interested in open source models and there's also some restrictions for some people that might not have a credit card that want to try out these different llm applications but they might not be able to tie a credit card to an open a account or an AWS account or a gcp account or what have you being able to run this all locally is really empowering because you're able to build out these applications and especially with something like this where you do have a local endpoint it just opens up the opportunity where all of a sudden now we have these powerful models that are becoming smaller and smaller all the while becoming more powerful and they just become more accessible as a result now that we have the lava model installed we can go over to the playground tab here and we have this multimodal session and what you can do here is say if I want to select the lava model if you're not sure of the preset you can just go ahead and click default LM Studio Mac OS preset and then we can go ahead and load up this model here you have the option where you can go ahead and reduce or expand the CPU and GPU utilization there's a number of different things that you can go ahead and grab here say if you want to get the API model identifier say if you're setting this up within an application like we saw with setting them up from a local server you might want to go ahead and grab that model identifier so say if I want to have both llama 3 as well as the lava model running now the one thing that you have to be mindful of is your systems limitation so say if you're having an application and you have embeddings and then you also have a vision model having those run concurrently it could be and likely will be an expensive process for your machine that's one thing to be mindful the more models that you do run the slower things can run and it can get really slow if you're trying to have an application that's doing all sorts of things right so you can imagine an application where you're embedding things and then quickly you're pivoting to going for a chat completion for local inference that can be quite expensive right having to load those things up it does take some time to be able to Pivot back and forth as you saw when these models were loading in just something to be mindful of once you have them all loaded in you can go ahead and start the server here just like this and and then you can go ahead and interact with your local server just from whether it's a curl request or you can go ahead and interact with some of these different options here they have a number of different python examples within here it'd be nice to eventually see some different programming languages as well but nevertheless there is the curl command as well as these examples that you can easily translate into the other languages that you might be using I just wanted to do a quick one introducing you to LM Studio this is just another really great option for running models locally there's other options out there such as o Lama which is pretty popular as well as Jan AI which I also encourage you to check out which are both doing similar things just in different ways that's it for this video if you found this video useful please like comment share and subscribe otherwise until the next one
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.