
In this video I show you how to build out your own Siri/Alexa/Hey Google style voice assistant with Node.js Langchain and ElevenLabs https://www.patreon.com/DevelopersDigest Links coming soon!
--- type: transcript date: 2023-10-10 youtube_id: myvkeCrw6Rg --- # Transcript: Create Your Own Voice Assistant with Node.js, Langchain & Eleven Labs in 9 Minutes in this video I'm going to be showing you how to set up your own voice assistant using Lang chain the openai API as well as 11 Labs so by the end of this video you'll have about 100 lines of code that you'll be able to customize and set up with whatever you'd like to do with large language models so if You' like to set up an agent or if you'd like to have it uh integrate with open AI functions or whatever You' like to do this is going to give you sort of the starting off point to get going with that so all the code that we're going to be writing is with in node.js we're not going to be setting up any front end in this and all of the different uh voice listening and all of the responses all of that is going to play and generate all within the backend here so you're going to need a couple things to set this all up so the first thing you'll need you'll need API keys from 11 labs and open AI so really simple to get API Keys once you've logged into 11 Labs just click your initial on the right hand side go to your account and view your API keys and similar for the openai API uh if you haven't made an account just go to platform. open.com make an account go to the API Keys page and create a new secret key so once you have that you can go ahead and npm in- Y make a new project directory or you can use something like bun andit and have your starting off point for your nodejs project and then once you have all of that set up go ahead and make a EnV so with in the EnV that's where we're we're going to put our open a API key and then the labs API key okay so once you have that go ahead within your index uh if you are using npm init uh just go ahead create an index while you're at it also create a directory called audio this is going to be where we essentially cach the audio files that were listening for on our local machine as well as the responses that we get back from 11 Labs so once we have all of that set up I just want to head over to the index JS so I have some comments here on uh what we're going to be going through here in just a moment but the one thing I want to point out is when you are installing this uh when you go to install the mic package make sure that you have the socks package installed if you're on Mac or Windows or this a record if you're on Linux so if you have home brew installed you can just go ahead and Brew install socks and have that uh running but if you run into errors without uh installing this that potentially could be wise so just I wanted to point that out so then I have within the a comment block here all the different things that you can npm or butn install so you can go ahead mpm install mic soundplay uh wave stream open a ey a handful of them I'm also going to put this within a repo that you can reach for within the description of the video if you just want to grab all the code and get going so the first thing that we're going to do is actually import these within our index.js here um so after they're all install it might just take a moment or two we're going to install all these things so I'm not going to go through them all here but I'll dive into what they're all doing as we actually write out the functions when we're leveraging the code so first we're going to set up the open AI instance as well as we're going to uh put in a variable for the keyword that we're going to detect for so if you think of something like Siri or Google what they do is they'll listen for a particular keyword right so depending on what you put here will be what we have within the condition that it's listening for on transcription so it's not actually going to perform subsequent task if it doesn't hear that keyword so I just use the GPT keyword in my application but feel free to use whatever you'd like so next we're going to set up some of the initial microphone setup so we're going to set up our great channels all sorts of things like that now you can play around with this you can look into the mic documentation and uh play around with it a little bit if this doesn't work for you um but that's essentially what we're going to be doing here so next we're going to initiate the recording process so right off the bat we're going to stop it if it's already ran so the way that this is set up is if it's already uh gone through a query and responded back with voice it's going to go ahead and uh stop the process and essentially clean that up for us so we don't have a buffer overrun uh or anything like that within our application so all we're going to be doing here is we're going to be saving out uh to the audio when it detects a silence and the silence is the next function that we have here so within our silence function this is where we're going to actually save out the audio and then this is going to be where we also trigger the transcription now the thing with the transcription is even if we're we don't detect that keyword we still have to transcribe right so it essentially is always listening um but if it doesn't have that keyword it's not going to go ahead and get a response from the open AI API and subsequently get and send that response to the 11 Labs API so once we have that uh we're going to check for the keyword here we're going to wait the response from the llm and then we're going to convert the response to audio here um so this is going to be leveraging our Lang chain uh setup as well as the open AI setup that we have and then we are going to wait for the response from the audio and I'm going to have console logs throughout so it's just going to run and loop through as you have a conversation with uh the llm here so this we just have a simple function that's going going to save out our audio and that's essentially what this is doing then here is going to be all that we need to actually transcribe the audio so to set up the whisper transcription from open AI is very simple if you're using their new SDK this is all that you essentially need to set it up so so long as you have the API key you can call it just like this and it's essentially in just a few lines of code to get the transcription back so next we're going to set up a simple communication with the open a llm so I integrated this with Lang chain just with the assumption that most people will likely want to integrate it with other aspects within Lang chain whether it's open AI functions or what have you so here's sort of the boiler plate and starting off point for that so this will respond with the text from the llm and then finally this is going to be where we send the response from the the llm to our 11 Labs endpoint and then we're going to wait for that to finish and then finally we're going to play out the audio here and finally right at the end we're just going to start the process and then I'm going to keep it persistent uh with without exiting by adding this process uh STD in. resume so I'll just demonstrate it here so if I go ahead and node index.js I'll just save it out first so if I go ahead and save this out you see that it's starting the listening process I'm actually going to make this a little bit bigger now if I just pause for a moment you see it detected silence and then it's restarting again so it didn't detect the uh keyword which I'm not going to say quite yet but you see that every time I pause it will go ahead and TR transcribe this so if I say hey GPT how far away is the Moon transcription hey GP how far away is the Moon let's try that one more time so hey GPT how far away is the moon the average distance from Earth to the Moon is approximately 238,857 and there you have it so that's basically it um obviously you can go ahead and add whatever you'd like within this um I'd imagine most people would probably be adding a lot of logic around the Lang chain area here cuz maybe you could integrate you know an agent and have it perform certain actions or uh get uh you know connected to the internet in some way or fashion where it's not just integrating just directly with an llm but that's pretty much for this video uh if you found this video useful please like comment share and subscribe consider becoming a subscriber on patreon as well and otherwise until the next one
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.