
Getting Started with Hume AI: A Comprehensive Next.js Template Tutorial In this video, I demonstrate how to get started building with Hume AI using their new Next.js template. Hume AI has recently gained attention for their empathic voice interface. I guide you through the process of integrating Hume into your project, pulling the template from GitHub, and setting up API keys. You’ll see how the template leverages WebSockets for real-time voice processing and detects a variety of emotions. Additionally, I showcase the template's structure and how to customize it for your needs. Finally, I explore the potential of voice interaction in web apps and discuss Hume's integrations with other models like GPT 4.0 and grok. Check out the template and start building your own projects! 00:00 Introduction to Hume AI 00:45 Setting Up the Next.js Template 01:15 Exploring the Voice Interface Features 01:48 Understanding the Technology Behind Hume 03:50 Integrating Other Models and Tools 05:25 Real-Time Data Transmission with WebSockets
--- type: transcript date: 2024-07-01 youtube_id: LyfOUT1teWU --- # Transcript: Hume AI: Empathic Voice Interface - Next.JS Template in this video I'm going to be showing you how you can get started building with Hume AI Hume AI you might be familiar from when they went viral a few months ago before the GPD 4 Omni demo H AI has been making some wave lately what the first step you're going to show I'm going to be showing in this video how to get started with their new nextjs template the thing that's impressive with Hume and how they've developed all of their offering is it really is a voice first and what they're calling IC voice interface it gives you a great way on how you can get started on leveraging some of these voice models I'm going to be showing you quickly what the template itself looks like which I think they did an excellent job on how you can get started with integrating H within your project to get started with the template you can head on over to GitHub you can pull down the repo once you have it pulled down the only thing that you need to get started are a couple API Keys you can head on over to their console click on API keys and you'll be able to copy both of those values and where you going to be putting those is within the EnV within the root of the project there's a env. example that looks just like this just remove the do example and then paste in your keys and save them out now once you have it all set up you can click Start call here you have this nice component here where when you're speaking into the microphone it's going to be registering that and show you that rendering visually you have the ability to end the call you can also mute your microphone which I'll show you in just a moment but I'll just pause for a second for it to respond back and you'll see what it looks like here fantastic oh I got that visual component sounds pretty cool ending the call and muting the microphone are handy features can't wait to see the demonstration as you see there I just muted it now it isn't actually listening I can walk you through a little bit about the different pieces here as you can see here you can see the different emotions that are being conveyed nice with this template and a little bit of the trick on how this works and why it's so quick is what streaming it across the wire and in this case it's using a websocket what's nice with leveraging websockets or web RTC is you can send that information in real time and then in this case once it detects a silence or that you've stopped talking at that point the server already has all of that information it can transcribe that text it can send it to an llm and then it can also send it to that text to speech model all within a relatively short amount of time now the thing that's interesting with Hume is that it can actually detect the different of emotion within your voice it detected concentration interest and determination and then as it was responding back it was able to give me the various metrics of Interest calmness Amusement all of those different intonations that we got back from the voice now there is a ton of interest in developing voice interfaces right now especially after GPD 40 and their demo it was really focused on that voice capability what was interesting with that is it was able to detect the emotions within your voice and it was able to in kind respond back W with the different intonations of the different emotions that were conveyed what's nice about this is you have a full nextjs template that you can play around with they also have really comprehensive docs it's always nice to see companies open source templates like this because this is really the way that you can bootstrap developers well it's nice to have some really small examples sometimes it's also nice to have these more comprehensive examples where someone could just take this project and start to quickly iterate on it and build out whatever they like it has a really nice structure in terms of the project it's really what you would expect with a nextjs application you can find all your various components within here there's a couple utilities within here but primarily it's going to be within the components folder that you're going to be changing out the different values and whatnot now the thing that's also interesting with Hume is they also offer some things like being able to integrate other models you can integrate gp24 you can integrate grock you can integrate Claude And they also have a really great playground as well you can go ahead start a call with a configuration of your custom setup that you have configured within here and it will stream back just like that next JZ starter kit that I just shown you now in terms of their documentation there's also quite a bit within here and another thing that's really nice to see is they do also have tool use if you want to talk to this model and then subsequently use some sort of tools being able to leverage voice like you had just seen in addition to using tools it can give you an open up all sorts of different experiences in the context of a web app we're not really used to talking to websites but this is a new sort of Frontier that we can start to explore right you can imagine landing on a website and just start asking for things say you go to Uber and you say I want to order a burger that has sweet potato fries and whatever right you could imagine talking into your microphone on the ubert eats website and then maybe it quickly spin the that up and then you can just talk through the steps rather than using your mouse or keyboard or something to that effect I think it's really interesting to explore this type of thing I'm definitely going to be digging into a little bit more and see what sort of examples I can potentially build out that could be interesting but overall they've done a great job in terms of their playground that they've built and then also in terms of offering these open-source starter kits for developers now one last thing that I wanted to show you in case you're interested is just to show you how this works if we go ahead and inspect this here and then we look into the network requests here so I'm going to refresh the page here and then what I'm going to do is I'm just going to clear it out and then as soon as I click Start call here we'll see that we have a web socket that has been established here and if I click that websocket and you can do this on your end as well and then you click messages you see all of the messages that are being sent across in real time it's converting that voice it's sending it across as binary and when I take a pause you will see that it will respond back with a Json payload within this web socet stream here so I'll just pause for a second and then we'll see those come upwing have a so I'm just going to end the call there and then I'm just going to cycle back up to the messages here and then you can see how this works so it's constantly sending your information across the wire and then at the point that it detects a silence it's going to go ahead and do the sub steps on the back endend where it's going to be sending it to an llm or using websockets or web RTC is all of that data is going to be at the server as it's being streamed across you can imagine this potentially in a video Contex or an audio context instead of waiting for that audio just to be sent as a big payload potentially this way you're able to save on the latency and it might only be a couple hundred milliseconds but all of that starts to add up especially if it's an application like this and it's talking back to you in real time and trying to emulate like a human having a conversation I think we're going to definitely be seeing a lot more applications that are both voice-based as well as using websockets or potentially web RTC like we saw with the live kit announcement if you caught that but that's pretty much it for this video I encourage you to check out the template play around with this if you build anything interesting put it within the comments below if you found this video useful please like comment share and subscribe otherwise until the next one
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.