
š Dive into the world of AI with this tutorial: "Building a Cross-LLM Voice Assistant in 12 Minutes in Next.JS". This step-by-step guide will take you through the process of creating your own personalized voice assistant, similar to Siri or Google Assistant, but with a powerful twist ā it integrates multiple language learning models (LLMs) such as Mistral-7B, Mixtral, GPT-3.5, GPT-4, Perplexity and Llama2! š What You'll Learn in This Tutorial: 00:00 Intro - Combining AI tech with Next.js for a dynamic voice assistant. 00:13 Setup - Initializing Next.js app and securing API keys. 00:52 Hooks Basics - Role and setup of React.js hooks. 01:01 Hooks Implementation - Crafting dynamic hooks for voice interaction. 01:41 Core Function - Building the main function and managing loading states. 02:34 Audio Management - Handling audio files and errors. 03:07 Model Setup - Speech-to-text integration and model bubble creation. 03:19 Silence & Keywords - Detecting silence and responding to keywords. 04:47 Speech Recognition - Incorporating web kit speech recognition. 05:24 JSX & Rendering - Setting up JSX and rendering model bubbles. 06:13 New Routes - Adding routes in Next.js for varied functionalities. 06:24 SDK Initialization - Starting Perplexity SDK and managing dependencies. 06:59 Environment Setup - Configuring environment variables and OpenAI. 07:57 Post Handler & Intro - Establishing post handlers and crafting intro messages. 09:09 Model Integration - Setting up and switching between AI models. 10:57 Perplexity API - Engaging with the Perplexity API. 11:37 Messaging & JSON - Creating messages and returning JSON data. 12:16 Wrap-Up - Concluding insights and next steps. š„ Don't forget to like, share, and subscribe for more cutting-edge tech tutorials. Your support fuels our passion for tech education and innovation! š Relevant Links: Repo: [https://github.com/developersdigest/Siri-of-Everything] Node.js: [nodejs.org] OpenAI API: [platform.openai.com/account/api-keys] Perplexity API: [docs.perplexity.ai/docs/getting-started] GitHub Repository: [github.com/developersdigest] š Follow me on Twitter for updates: [@dev__digest] Thank you for joining us in this fast-paced, educational journey to build your own Cross-LLM Voice Assistant in Next.JS! šššļø
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
--- type: transcript date: 2024-01-10 youtube_id: Ku4VU3O41cQ --- # Transcript: Build a Multi-LLM Voice Assistant in 12 Minutes with Next.JS in this video I'm going to be showing you how you can build out your own voice assistant where you can have conversations with a multitude of different llms so you can sort of think of this akin to something like Siri or Google Assistant first I just want to give a quick demonstration on how this will work GPT how far away is the Moon GPT 3.5 here the Moon is approximately 238,857 details about defense secretary Lloyd Austin's secretive Hospital stay and the delay in telling President Biden first thing we're going to do is just create next app so you can go ahead and Bun create next app name it whatever you want and then we're just going to be using that as the starting off point for our application off the bat just create a EnV file you're going to have to get an API key from and perplexity now perplexity and open AI are slightly optional if you don't want to use those M Points additionally if you want to use other apis llms you can go ahead and hunt down those API keys by the end of the video you'll have a good handle on how to incorporate any llm really into this I have things organized in two different files just to keep it succinct and easy to understand throughout the video the first thing that we're going to do is just import the necessary hooks from react since we're using typescript we are going to be setting up some of those types throughout the application which you'll see here and there now we're going to wrap everything within our home component and then we're going to run through handful of hooks we're going to have these different hooks for whether it's recording playing we're going to have one for the transcript the model that's selected as well as the response and then we're going to have a loading State as well so we're also going to have some ref hooks this is going to be to detect any silence within the application this is the function that's going to get applied to each of those bubbles on which model is responding so it'll give that slight pulse back and forth on say if it's gp4 or if it's GPT 3.5 depending on which model is responding it will just pulsate back and forth on the front end for you next we're going to set up the main function of how this essentially works and sends everything to the back end what we're going to do is we're just going to set that loading state that loading state will create that spinner and that spinner will be applied to the back of that bubble there next we're going to see if there's already a model keyword that's determined and if there's not one we're going to just default it to GPT 3.5 for instance if you don't specify hey gp4 it's going to go ahead and default to gpt3 but you can go ahead and default this to whatever model you like so if you want to use a local model or a free model or maybe a model where you have some credits you you could you could swap that in here as well so first we're going to stop recording and a lot of this will make sense as we go through it so for the actual post request what we're going to be sending to the back end is going to be the message that we use from The Voice API within the browser and then we're also going to send back the model keyword next we're going to go ahead and see if it's an audio file if it's an audio file we're going to go through set that it is playing and then on end once it has stopped playing we're going to start the recording again essentially whenever that voice is responding back to us it's not actually listening to our microphone so you don't have to run into scenarios where it's sort of recursively listening to the responses and it's triggering and transcribing and puts you in a loop that you don't want to have then we're just going to have some simple air handling then we're going to set the loading State back to f once it's all done next we're going to set up our individual model bubbles within our handle result what we're going to be doing here is this is going to be where we go ahead and create that transcript that we're going to go ahead and send to the back end then we're also going to detect whether there is a silence for a period of 2 seconds here so that's what this 2,000 is then from here we're going to go ahead and determine all of the different model keywords that we have so this is going to be where you specify all of the different models that you're going to be using in your application so you see gp4 plexity Etc and from there we're just going to go ahead and check the first three words on whether that model keyword existed so if you think about it if you say something like hey there model name it will be able to detect that or you can just go ahead and Trigger the model name within the first word or second word if you wanted to similar to what we did earlier we're going to go ahead and set the default model if there isn't one that's detected to GP I use for GPT 3.5 this GPT keyword and you see here that you can really specify what keywords that you want but the thing that you'll have to contend with is you'll have to see how that voice pii returns it so for gbd4 for instance when I said that keyword it puts it all within one string typically and you'll have to play around with all of them a little bit to see what works like an example was when I tried to use draw it never got that right so I use mixture for the you know mixture of experts for that keyword so you can play around with these but you could use a name if you wanted to so you could say Hey Joe or hey whatever to trigger these as well if you wanted to so then we're just going to go ahead and send that transcript to the back end with our detected model and then we're going to go ahead and set our transcript back to blank this is going to be how we set up the web kit speech recognition and all of that now the neat thing with this is you don't need to use something like whisper so the latency is going to be significantly decreased for not having to wait for those results but the tradeoff with the web speech recognition API is it might not have as high quality results as whisper but it's sort of ADD tradeoff like everything in programming so from there we're just going to use a simple cleanup for when our component unmounts then we're going to create a function just to stop recording so if you just go ahead and click that button to stop the recording and then we're also going to handle when the recording starts so if you just go ahead and toggle that button within the center there next we're going to go ahead and set up our jsx the first part of our jsx we're going to have the area where it shows that it is listening and it's going to show you the transcript so I found this helpful especially if you're trying out different keywords to actually see everything to make sure that it's working and that it's detecting your voice correctly and all of that next what we're going to do is we're just going to render out all of those model bubbles this is the jsx that we set up earlier a little bit higher in the application here within this we're just going to specify the different models the keyword as well as what we want to have displayed visually within the model bubble as well as their colors here pretty straightforward then within the center there I have positioned where our button to handle the toggling on and off of the recording and listening and transcribing and all of that from there we just have the latter half of all of our other models so from here we're just going to go ahead and create a new route So within nextjs you can create API chat and then route and then we're going to go ahead and hop right within our routts the first thing that we're going to do within the back end is we're going to import a handful of dependencies here so if I just open up the package shot what we're going to install is basically everything here so we're going to install these Lang chain packages api. EnV lank chain and oi now once those are installed what we're going to do is we're just going to go through and you'll start to see what all of these are doing as we fill this all out the next what we're going to do is we're going to initialize the perplexity SDK the way that perplexity decided to implement their no version of their API is with the API package so that is what this dependency is doing here next we're just going to go ahead and configure our environment variables those environment variables that we had at the beginning of the video this is going to be where we go ahead and access those similar to the front end we're just going to set up some features to enable all the Tex script types to work as they should next we're going to initialize the open AI instance we're going to create a function that is going to be how we interact with the text to speech open a endpoint we're going to have this reusable function where we create the audio we're going to be using the tts1 endpoint we're going to create a function and what this function does is it takes in the message that we get back from the llm after it responds to our transcript and then we're also going to be able to specify a unique voice for each llm so you'll be able to swap these out for whatever voice you like there's also a different model where you can get an HD model so higher quality voices if you'd like and then this function all it will do will essentially return us that MP3 within a base 64 form that we're going to send to the front end next we're going to set up our post Handler So within the post Handler we're going to extract that message that we get from the front end and then we're going to similar to what we head on the front end is if the model isn't specified is we're going to specify to that gpt3 model and then what we're going to do is we're going to just remove the first word of the string this is something you could probably put some more sophisticated logic here but the thought here is say if you do say something just like GPT or perplexity that word isn't going to be passed into the L from there we're going to set up our intro message next we're going to initialize an intro message we're going to set the base 64 audio and this is going to be where we put a lot of the different variables depending on the condition from here we're going to go ahead and give a common prompt to all of the models so we're just going to be passing in directly within the message of all of these models the same message and we're going to specify for it to be precise and concise never respond in more than one to two sentences the way that this is set up since not all of these models that we're interacting with are built into Lang chain we're going to be able to see different examples on how you could incorporate something like the perplexity API which isn't yet Incorporated in the Lang chain JS version next we're going to use this if Els statement you could also use a switch if you want or potentially a hatch map if you have an awful lot of different models and what we're going to be doing here is we're going to specify the voice that we're going to use we're going to specify the intro message you saw in the demo that it responds back with the model name it's going to have the model name and then the message depending on the model is we're going to instantiate within each case you can go ahead and declare the ele M that you want to use and if it requires an API key or any setup like that so you'll see for some of the Lang chain ones that it's going to have this similar syntax here and then we're going to have an intro message in the example you saw that it responds back with the model name and then the message and this is going to be where you specify the voice so I plugged in a handful of the different voices from open AI so I should cover just about all of them I think within this example so similar for gp4 you can go ahead and specify the model name within the declar for the llm and just like here we're going to have the intro message as well as the voice so very similar for a lot of them now for the AMA ones the way that it works is if you have Ama running and you have the models downloaded so the one thing for this is if you want to use the olama models you will have to make sure that you download the specific models on my machine I have the mistol model as well as the Llama model all installed ready to go so similar here we have the intro message and then a different voice also here I specified for mistl local mist because I am also using the mistal Endo from perplexity the mistal Endo from perplexity is very fast for inference so exact same setup for llama 2 with just a few different options here just like the AMA mistal setup for the local llama setup is it has basically the same setup just with the llama and voice swapped out then for the MX stroll model this is going to be where we start to play around with the perplexity API now it's pretty easy to interact with this SDK it's a little bit of a different structure and syntax as well as a schema that you have to contend with but nonetheless it's set up in a way where a lot of the variables are going to be reusable like the inter message and voice and the only thing that you have to substitute is this area when interacting with the perplexity API next we're going to set up the perplexity model so this is a really neat model where you can get up-to-date information if you haven't used perplexity I encourage you to definitely check it out then from there we're just going to set up the llama 270b model once we have all that set up we're going to go ahead and create our full message we're going to add the intro message and the full message back from the llm and create the full message that we're going to send within create audio and actually just looking at this I could go ahead and remove this intro message from both the argument there as well as this parameter within the function so we don't actually need that then we're just going to be sending back the Json within that base 64 string we're going to be supposed ifying that it is audio MP3 and then we're also going to be passing back the model name so that's pretty much it so there might be a couple tweaks that I make once the repository goes live but overall I'm going to keep it largely the same if you found this video useful please like comment share subscribe consider becoming a paid subscriber on patreon or on YouTube otherwise until the next one
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.