
Creating an AI-Enhanced Podcast Web App: Comprehensive Tutorial In this video, I'll guide you through building an AI-enabled full stack web app that not only crawls and processes data from various links but also generates a podcast script and audio using state-of-the-art technologies. We'll explore the step-by-step process using Firecrawl for data scraping, the Llama 90B model via Groq for LLM inference, and ElevenLabs for text-to-speech conversion. I'll cover everything from the front end and backend to setting up and deploying the application. Additionally, you'll learn about error handling, integrating audio functionalities, and using simple animations. By the end of this video, you'll be able to configure and run your own comprehensive podcast engine. Check the description for all the necessary links and resources. Repo: https://github.com/developersdigest/llm-podcast-engine You can obtain these API keys from the following sources: - FireCrawl API Key - https://www.firecrawl.dev/app/api-keys - Groq API Key - https://console.groq.com/keys - ElevenLabs API Key - https://try.elevenlabs.io/ghybe9fk5htz 00:00 Introduction to Building an AI-Enabled Web App 01:04 Setting Up API Keys 02:36 Initializing the Project 03:11 Creating the Podcast Generation Route 04:45 Handling Audio File Creation and Storage 06:07 Sending Requests and Handling Responses 07:12 Managing Context Windows and Final Steps 08:44 Crafting a Hilarious and Informative Podcast 09:17 Customizing the Model and Frontend Updates 09:43 Handling LLM Responses and Errors 10:32 Setting Up the LLM Podcast Engine 11:12 Managing URLs and User Inputs 11:44 Post Requests and State Management 13:24 Audio Handling and Event Listeners 13:55 Adding Animations and Final Touches 15:33 Conclusion and Final Thoughts
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
--- type: transcript date: 2024-10-28 youtube_id: ievgM928RBc --- # Transcript: Build AI Podcasts from Any Site: Full-Stack Guide with Firecrawl, ElevenLabs & Next.js in this video I'm going to be showing you how to build out a full stack AI enabled web app and what this will do is we're going to crawl a bunch of different links that you can put in here on the left hand side it's going to use fir crawl to get the information from all of those links then once we have those links back we're going to send that information to an llm to be processed to ultimately write a script to us that we're going to send to 11 labs to generate a little podcast for us by the end of the video I'm going to walk you through the front end the back end and everything that you need on how to get started with setting this up yourself so you really understand what's Happening Here we see that the tech news for today Monday October 21st is the following and then we have the information from those websites that we specified so you can really put whatever you want within here I use Tech crunch The Verge and yde combinator just to generate a quick little podcast here and also you will see it set at 5 Seconds you can also remove the limitation if you want to have these full a few minute long or five minute long podcast on all of the various news that you want to keep up on the first thing that you're going to have to get is an API key from fir craw so if you haven't used fire crawl before you will be able to sign up for their free plan to try this out and what it allows you to do is you can go and ping different URLs and there's a number of different features within here from crawling to scraping but then also returning clean markdown which is really beneficial for when you want to pass information from websites to an llm you can just make a free account here here once you're within here you can go and grab your API key now next what we need is we need an API key from 11 Labs if you haven't used 11 Labs before it's essentially a text to speech service there's a ton of different voices you can do some fun things like cloning your voice or cloning a celebrities voice if you'd like now alternatively you can just go in within here set one of the preset voices that they have to which they have an absolute ton of different voices that you can choose from and then you can use this whether it's from their interface or our case we're going to be using it from the API to generate this dynamically for us once You' made a account you can just go on over to the bottom left corner there where your name is you can select API keys and generate a key just hold on the page where you have the API key then finally for the llm inference portion I'm going to be using the Llama 3.2 90b model which is on grock right now and the reason I wanted to use Gro is you'll be able to generate this for free right now they do have a generous free tier for developers that you can use you can make a free account on console. gro.com go to API Keys generate an API key we're going to be putting this in our application as one of the first steps in terms of our project so I'm going to put this repo you can find it within the description of the video where you can pull it down and if you just want to install it you can just pnpm install everything or bun install everything add your API keys and you'll be Off to the Races or alternatively if you want to understand how the application works you can just continue on watching the video and you'll understand effectively how everything works and it probably isn't as involved as you might think something like this is the first thing that we're going to have is a EnV within the GitHub repository so just remove that do example and then you can put in the API keys that we just walked through in the first step here one by one once we have that we're going to have a route and within the route we're going to call it generate podcast we're going to import all of the required modules that we're going to be using in this example in this case we're going to be using fir craw we're going to be using the openai SDK with Gro they do have an open AI compatible endpoint and what that allows us to do is we can query it as if it was something like a GPT series of models and it will more or less conform to what the spec is for the open AI SDK all that we need to do is pass in the base URL as well as the string of the model to be able to use it so next we're going to be using the 11 Labs client again this is going to be for the voice like I mentioned earlier and then we're going to be using a few different packages to be able to use this in this case I set up the application is going to run locally on your machine now if you do want to serve this up you will have to use something like an S3 bucket or something like Tigress or even some form of blob storage should work to be able to save your MP3s if you want to use this in a hosted application and that should really only be changing a small little block of code the first thing that we're going to do is we're going to load up our environment variables to be able to reference all of those that we had just added into ourv from there we're going to initialize the fir call client and then we're going to be passing in the API key then next what we're going to be doing is we're going to be setting up and initializing each of our clients fir craw open AI for the grock endpoint as well as the 11 Labs client so in essence all that we're doing in each of these is we're passing in the API Keys into the SDK for our requests and then in the case of openi we're also going to be specifying the base URL to point to the grock endpoint from there we're going to have a few different helper functions the first function is going to be to create the audio file from text this is going to be where we send in that text request of what we get from the subsequent steps that we get from the generated llm response and this is going to be where we pass it to 11 Labs within this you will see that I hardcoded the substring to only use the first 100 characters I'd encourage you to First limit what you're sending to 11 Labs especially if you're on their free tier because they give you 10,000 characters per month but in this case if it's just for testing I used 800 characters so you get five or six seconds of audio that you can test just to make sure that it is successfully generating that audio for you once we have that we're going to generate a unique file name we're just going to do it based on the date this could really be whatever you'd like though as soon as we get the response back from 11 Labs we're just going to be writing that locally if you wanted to deploy this application you just have to swap out these couple things to something like an S3 bucket or Tigress or even some form of blob storage where you'd be able to put this in with maybe a unique key that's associated with potentially a user that you have or what have you to be able to identify that but in this example it's just a locally running application and you just have to do a little bit of refactoring if you plan on using this within a production use case once we have that from there we're just going to have a little bit of air handling and then from here is we're going to set up our actual post requests what we're going to do is once we have that post request we're going to destructure the URLs that we receive from the request in this case the URLs are just going to be that array of all of the different URLs that we put within the UI and then as you saw in the starter example you saw the UI updating as portions of the backend were going through the different pieces of log the way that we're going to do this is we're going to be setting up a readable stream that we're going to consume from the client side within here the first thing that we're going to do is we're going to be setting up a little helper function that's going to stream out the updates to the front end we're going to be able to see everything that is working through on the back end just stream out and be able to show and map to what we have within the UI logic to see whether it's updating or what have you next what we're going to do is we're going to send concurrent requests for all of the different URLs that we had within our array that we sent in from the front end so instead of doing this successively we're just going to make the request and we're going to have a promise that as soon as all of these are resolved what we're going to do is we're going to combine all of those different results now one thing that I do want to mention is with this application is let's say you put in like some large number of links you will have to be mindful of the context window of the El that you're using I believe that time of recording the context window for the 90b 3.2 llama model on grock is 128 Zer and tokens of context but just be mindful of that that you could potentially have to put in a further mechanism if you want to be adding a large amount of documents that's just something to be mindful of if you run into any issues at time of inference just be mindful of what you're actually sending into the llm because you could do some subsequent tweaks to make this a little bit more performant adding in some rag functionality to be able to save on tokens or what have you if you're working with an llm that has a smaller context window this is just an area to be mindful of now if for whatever reason that we don't have any length within the markdown that we had combined in the previous step we're just going to send a message to the user that no content could be scraped from there what we're going to do is we're going to send an update to the user that we're compiling all of the different stories and this is going to be where we send in a request to the llm like I mentioned you can use Gro you can use open AI you can basically use whatever you you can even swap out this block for any different model even if it isn't open AI compatible it's going to be relatively easy to Swap this out to whatever you'd like to use in my system message what I specified is you are a witty Tech podcaster create a 5 minute script covering the top 5 to 10 most interesting stories Summarize each story in one to four sentences keeping the tone funny and entertaining aim for a mix of humor within our user message we're going to be specifying the date this is going to allow for the script to say something like it's Monday whatever the date is here are the top text stories or whatever it might be and in this we're going to say create a hilarious and informative 5-minute podcast and this can really be whatever you'd like if you want it to be serious or focus on particular aspects or read it in a particular tone you can really play around with this portion of it and that's the fun of something like this is you can really steer it to whatever you'd really like for the model string you can swap this out if you want to use something like a MW model or maybe one of the Gemma models if you want to change out the model to something else you can also swap it out right here and in this case we're going to be specifying it to to stream back then from there we're going to be sending an update to the front end we're going to be saying crafting Woody commentary and this is going to be where we Loop through all of the different chunks that we get back from Gro and this is going to stream to the front end of our application and then finally once the llm inference is done we're going to be sending that into the function that we had declared earlier that's going to take that llm response and in this case it's going to be the llm response in full and we're going to be sending that in and passing it to 11 labs to have that speech to text effect that we're ultimately going to be saving down from there we're going to be sending out our final message that it's complete and this is going to let our front end of the application know that we're going to be able to update the state of the audio and what have you within our application then we'll just do a catch if there are for whatever reason any errors in any of the previous steps then finally we're just going to be specifying the type of response that we have within our post request and in this case we're going to be streaming back the event stream to the front end of our application and that pretty much concludes our backend from there we're going to be moving over to our page within our page all that we have is the llm podcast engine and within the llm podcast engine it is relatively long here and then in terms of the component itself of what we're going to be setting up is we're going to be using a handful of different libraries which I'll touch on as we go through them in this example the first thing that we're going to do is we're going to set up our llm podcast engine component within here we're going to have a number of hooks this is going to specify whether it's loading there is a new URL that adding within the input bar the status and all of that and then also that expanded bar whether it's expanded or if it's just that initial screen where you have the URLs first we're just going to be using a couple refs within here we're going to check whether the URL is an actual URL otherwise we're going to let the user know hey you didn't put in a valid URL and it catch that before the request is sent to the back end then from there we're going to have a Handler to add the URL if it is a valid URL we're going to be setting that to our urls array and this is the array that ultimately gets passed to our backend then within the UI we do have the ability to remove the URL that's going to be where you click the X and it will set the URL to actually remove that URL and filter it out from the list next this function is going to essentially handle the line share of our application it's going to be the post request whether there's a loading state it will show whether it's expanded or not on that initial state where it's just the URLs it will show the new script it will show all of the different step statuses and then also the audio Source once we get that response back from 11 labs and that we do have that file saved to our machine but within here we're going to be making a post request to what we had just set up and we're going to be specifying the URLs within the body of the message here then if there are any errors we're just going to throw an error on the front end then this is going to be where we consume all of those responses so that we get back as events from our server so what we're going to do is we're going to set up a while loop and while it's true we're going to be going through and breaking out all of those different messages that we get back and then we're going to be specifying and updating the particular State based on the message so if we need to update it if it's the content if it's complete and this is going to be what sets the various different hooks within the front end of our application like I showed you within the back end of the application that's going to set the current status alternatively if it's the script we're going to be getting that streaming response back and we're going to concatenate all of those responses to this state within our application and then finally if it's complete we can set the current status that the audio is ready once everything is done within our application finally if there AR any errors from the back end we're just going to set that within the status and within this function itself we are going to have a little bit of error handling as well just to make sure if there are any issues that we will be able to see what's going on and then from there that's most of the front end Logics and then from there we have a simple play button and this is going to specify the audio ref whether to play play or pause that little audio we also have a function on if we want to download that this is going to be something where if you do decide to build out on this and use something like S3 or Tigress to use this in more of a production use case you can just specify whatever that link is and be able to download it from there we're going to be setting up an event listener to listen for the different audio States so whether it's playing or whether the audio is loading and this is going to be what we use to set up the audio event listeners and finally there is some simple animation where it is animating that background as soon as you send in that request just to give it a little bit more of a cool effect now this isn't a requirement by any means and you can obviously Swap this out or take it out to whatever you'd like since we have some animations we're going to be using frame or Motion in this that's going to be how we specify the animation as well as what we have passed and then it's basically our jsx the first thing we're going to pass in within frame or motion is we're going to be passing in that gradient animation we're going to have our simple header to show podcast engine then with in the main content it is a decent amount so we're going to have that expansion where it slides out as soon as we submit that to the back end where it will just slowly ease out to have it side by side and then we're going to be updating the state the rest is pretty self-explanatory we're going to have the input the ability to add another URL we're going to map out all of the URLs that we have within State and then we're going to have the button to generate the podcast and if we've submitted it we're going to show that it is generating the podcast and show that intermittent load State and then from here this is essentially the right hand side of our application if it's expanded we're going to show all of the different current statuses at the top and then as we have those tokens streamed back from the llm and through to the front end of our application we're going to be updating it within our new script here once the audio is done we're going to have a simple HTML audio element you could swap this out to something a bit more fancy if you'd like and within this we're going to show the loading State we're also going to have that button to play and pause it and that is pretty much it we have our down download button but in terms of the front end application it is relatively straightforward I know this was a little bit more of a technical Deep dive I do like to do these on my channel to really show you end to end on how you can set up these little llm applications and be able to walk through an idea from the back end to the front end to getting your API Keys as well as tying together different services like fir crawl 11 Labs as well as the models on groc so that's pretty much it for I wanted to thank fir crawl for partnering on this video I encourage you to check out their ser Serv as well as the other I'm going to put all of the links for everything that you need within the description of the video but otherwise if you found this video useful please comment share and subscribe otherwise until the next one
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.