In this video, we dive into the world of Eleven Labs, an AI-powered text-to-speech platform that enables you to create lifelike voices for your web applications. We'll explore how to get started with Eleven Labs and integrate it into your Node.js projects, making your applications more engaging and interactive. Throughout the tutorial, we'll cover the basics of Eleven Labs, including creating and cloning voices, generating voiceovers, and integrating the API into your Node.js projects. By the end of this video, you'll have a solid understanding of how to use Eleven Labs to enhance your web development projects with AI-generated speech. Don't forget to follow me on Twitter and GitHub for more updates and tutorials on AI, Node.js, and web development: Twitter: https://twitter.com/dev__digest GitHub: https://github.com/developersdigest
--- type: transcript date: 2023-08-02 youtube_id: BkIFy7K5Du0 --- # Transcript: Unlocking AI Voices: Eleven Labs Text-to-Speech in this video I'm going to be showing you 11 Labs I'm going to be showing you both their web interface as well as how to get started with using their API and building out a simple node.js application so 11 Labs if you're not familiar is a company that's really focused on the text to speech portion of the generative AI boom and they really have come up with an incredible offering for users to be able to just simply make an account and get going really quickly without needing to you know read the documentation or or really knowing too too much about it right off the bat to get going so the thing to note with 11 Labs is when you make an account you don't need a credit card immediately to actually access the services so you do have access to 10 000 characters per month that you can leverage and you'll be able to use up to 2500 characters per piece of text generation that you have so if I just demonstrate it here and I go ahead and generate developers Digest you see that it is just simply taking that text and almost instantly or pretty quick obviously this is a a short piece of text it gives me that piece of text back now if I look at the quota here you see that as I click generate that this number starts to go down now if you look at the number right now I'm just going to hop over to my node.js application and show you the nice thing with how they've integrated this so you see 9113 is if I just go ahead and run a node.js equivalent and ping their API you see that that number is now less after generating it from the API so it's really nice that they've integrated both their API and their core offering within their their web GUI here within you know the same quota so everything's sort of bucketed and clear so the thing to note with 10 000 characters if you're thinking about how far that really gets you is it's about 12 minutes and where I got that 12 minutes from is if you look at their creator tier you see that a hundred thousand characters per month is about two hours of generated audio so ten thousand characters gives you about 12 minutes worth that you can play around with so if you're playing around with this just be mindful you can run through those credit credits pretty quickly but if you're interested or already have a use case in mind you can get started very simply with picking one of their tiers so they start five dollars twenty two dollars all the way up to hundreds of dollars if you're using this at like an Enterprise scale so the other thing with 11 Labs is they have this voice library and I found this voice Library much more uh interesting and easy to use than just trying to select and find the model from this drop down of their pre-made models that you can use so you see here that it also defaults to sort by trending so the nice thing with that is you can sort of have a view of what others find interesting or potentially are the best models they sort of bubble to the top and you'll be able just to click some samples here so if I click but this one Excellence is not a skill it is an attitude and I'll just click a couple more it is not enough and Within These you can see that they all sort of have very different intonations and all of that so the other cool thing with 11 Labs is it also has this uh area called the voice lab where you can actually go ahead and create uh different voices so you can go in and grab different models that you want to use and sort of use those as boilerplate but you can also create your voice from scratch so I'm not going to be doing that in this video but if you've done that I'd be curious to hear in the comments below if it worked how well it worked and what your experience generally was so that's something that's newer from my understanding of their offering where you can actually clone your voice and you can start to think of a whole host of applications where if you want to have your voice used for something right so another nice thing with 11 Labs is by default if you send a request through the web interface or the API everything shows up within the history tab here so why this could be useful is let's say you're setting up your node.js or python integration of this and you didn't save out some pieces of audio that you wish you had you could just come into the history tab go in here click the one that you need and download it so it's it's one thing if it's a you know a few words that you're generating the audio for but you can imagine a situation if say you're generating you know your 2500 characters worth of audio you'd like to be able to not just have that thrown away or accidentally not save it or something like that so the other thing to note is with their free tier you have up to 2500 characters but with their paid tier you have up to 5 000 characters so you won't be able to just feed this say like a book of yours you'll have to sort of you know do that in segments if that's something that you're interested in doing but just something to be mindful of I'd imagine over time something like that will probably increase as well so the other thing to note is they have a very simple very clean API documentation where you can just go in here and you can see how to integrate this into your into your application so I'm going to be showing you a node.js example but if you're using a different programming language I'd encourage you just head over here the the documentation is very clear very straightforward to be able to use where you can spin this up in whatever programming language that you're using so with that said I'm going to show you a node.js wrapper that I found on GitHub that makes it really simple to interact with their API for us node.js developers so head over to this repo here if you're interested in I'd encourage you give a star or Fork the repo shout out to the contributors on this and all it really is is it simply requiring you to install the package grab an API key and grab the voice ID and then you're off to the races so there's an integration where it will save out a file for you or you can you know take that file and you you know encode it in base64 and save it or whatever you want to do with it right so if I head back to the vs code here I'll just sort of show you how simple it really is so all you have to do is I'll just clear my terminal here I want to actually run these commands but if you just mpm in it Dash y that will give you your package Json and if you npm install 11 Labs node just like this as well as dot EnV you can go ahead and install that then once you have your Dot EMV and 11 Labs installed you can just touch index.js and Dot EnV that will give you this folder as well as your dot EnV and then within your dot EnV all that you have to do is if you head over back to their website here you click your initial and your API key is right there that you can reach for so you can just go to your dot EnV file and then you can just put in API underscore key equals and paste in your key pair so once you have that all installed and those files set up you should be good to go so in the example here or on the GitHub repository it has a voice ID already set and in this demonstration of the wrapper I'm just going to be saving out that audio here so if I just say hello YouTube and I run this you can see you get a response back quite quickly obviously it's very short piece of text hello YouTube and the thing with this is because it is pretty quick you could imagine certain use cases where if you want to generate audio on the Fly where this could be interesting so say if it's like a video game context and you want to have characters generate uh audio on the Fly that could be interesting like imagine in a video game where say you are talking within a microphone and then whisper is then transcribing that and then that transcription gets sent to uh you know an llm and then you know subsequent steps it gives you a response in audio from a character in real time so there's a lot of interesting use cases so it could be a gaming context it could be you know like the movie her right where you're talking to something in your ear and whatnot so I'd imagine there's going to be a lot of different interesting use cases for this to come out so one of the ones that I found sort of more novel and probably more more practical and easy to implement for say web developers is if you want to add simply a button or a little audio file on something like a blog post or something that has a long piece of text so that could be interesting for say both accessibility sake right people that don't see the screen you know having that option to actually hear that and not necessarily within that you know computerized voice that a lot of screen readers have or you know simply you say you're on the go or you're driving or something being able to click and listen to an article is another option right so interesting to have this new medium that you can play around with where you can generate audio on the fly or from text so like I mentioned I just wanted to also quickly touch on a couple different options for how you can access some open source model so if you go over to hugging face you can find a handful of text-to-speech models and their Top Model right now is this bark model so you can head over to GitHub there's a ton of stars on this project right now I think somewhere in the order of 25 000 Stars so a lot of people are really excited about this particular model and you can listen to some examples here so I'll just play a couple the the model is called bark like Clifford the Big Red Dog or um or bark as in tree bar so there's a handful of different examples here I won't sit here and play you all of them but if you're more interested in say deploying your own model and being able to have sort of full control over that this is a potential Avenue that you could take so you could go into hugging phase and pull this down or GitHub for that matter or just check out all the different ones that are trending on hugging face so if you're not familiar with hugging face I have a video on that which I'll link somewhere in the description potentially where you can take a look on how easy it is to either use their inference models or how you can use and deploy these to your own services so that's another option that you can explore but the thing with 11 Labs is it comes with all of that set up so say you don't want to worry about how to set all that up or maintain it here's an option that you can get set up and going pretty quickly and you can sort of lean into it right maybe you don't need those High tiers right off the way or right off the bat maybe you want to just sort of develop some proof of Concepts and play with this it gives you a very reasonable amount that you can play with for free each month and even their paid tiers it gives you quite a bit that you can play with without it being too you know exorbitantly expensive or anything so hopefully you found this video useful if you did please like comment share and subscribe and otherwise until the next one
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.