
šļøš Transcribe & Pay in a Snap! | Full Stack Audio Transcription App with OpenAI & Stripe Welcome to the world of audio transcription and online payments! In this video, we'll build a full stack web application that allows users to upload audio files, transcribe them using OpenAI's Whisper ASR API, and make payments using Stripe. Whether you're a podcaster, journalist, or just someone who loves tinkering with code, this tutorial is for you! We'll start by importing all the libraries and modules we need, including Express.js, Multer, Stripe, and more. Then, we'll configure file storage, set up middleware, and create endpoints for handling file uploads, transcription, and payments. We'll also add a front-end with Dropzone for file uploads and Stripe Elements for payments. By the end of this video, you'll have a fully functional app that transcribes audio files and charges users for the service. So grab your headphones, and let's get coding! š Source Code: https://github.com/developersdigest/Build-a-Whisper-Stripe-App-Full-Stack-Tutorial š OpenAI Whisper API: https://openai.com/blog/introducing-chatgpt-and-whisper-apis š³ Stripe API: https://stripe.com/docs
--- type: transcript date: 2023-04-25 youtube_id: 2HdykAw1ChQ --- # Transcript: OpenAI's Whisper & Stripe App in 30 Minutes: Full Stack Tutorial all right in this one I'm going to be showing you how to build out a simple full stack app where we leverage opening eyes whisper API and stripe for incorporating payments so the way that this works is we're going to be able to drag an audio file into our interface here and then from there we have a dynamic price that we have generated and set and then from there we can input credit card information so this is just Stripes test credit card and we can just put in like we have here we can click pay and transcribe and then from there we'll get this loader that pops up showing that the payment's been successful and now it's actually transcribing the audio from the endpoint so you see it's pretty quick this is the size of about a two minute video that I had that we have transcribed we'll also set it up so it downloads a txt file that you can reference as well so so we're going to be building it in node.js and on the front end we're going to be using HTML CSS and JavaScript we're going to be referencing some libraries both on the back end and on the front end and without further Ado let's get into it so the first thing that we're going to do is I'm just going to hop over to another Chrome window that I have here so it goes without saying you're going to need node.js installed for this so likely if you're seeing this video you already have it installed but just to cover our basis in case there's some some people that don't have it yet so once you have that we're going to have to head over to the platform.openai.com website and just head over to the API Keys tab here so once you're there it's pretty simple to generate a key as you see you just click create new secret key you can name the key and then you'll copy that key so we'll you can just leave that tab open I'll show you where we're going to actually going to be putting that API key in a second and once we have all that ready we're going to hop over to stripe and what we're going to do in stripe is we're going to head over set up an account I'm not going to go through setting up all the steps it's pretty intuitive just go register as if you're a new business go through all the steps and then ultimately once you're all set up and in the dashboard just click the developers button at the top here so we're going to have two keys that we're going to use for stripe so we're going to have our publishable key and our secret key so just to quickly touch on these so they don't get confused so our publishable key is what we're going to have in the front end of our application so this is the part that doesn't need to be secure somewhere it's how the front end is going to interact with the back end and then it goes without saying our secret key so this is going to be what we reference in the back end and this is our private key so don't share that or put that on GitHub or what have you so once we have these two tabs open I'm just going to hop over to vs code so in our vs code if we open up a before we open up I should say we're going to set up a simple a couple directories as well as some files so if you go ahead and so the first thing that you can run in your terminal is if you just go npm init Dash y that will allow us to initialize the project then from there we can touch or just create in vs code create a new file for server then we're going to create another another folder so you can make dur or create a folder for uploads and then we're going to have our public folder here so once we have our public folder you're going to make these three files so app.js index.html and style.css so once we have all that set up we're going to go into our DOT EnV So within our DOT EnV you should still have your Chrome tabs open here and you'll be able to copy over your key from openai and then just make sure that you have it within here as it's shown so you're not going to put the API key in a string or anything you're just going to paste it after the equal sign for all these values so we'll grab that from open AI we'll head over to stripe get our secret key so we'll reveal our secret key put it here and then our session secret so you can generate this this is going to be what we use to you uh use our file paths within Express so you can just put some random characters or or generate it there's a handful of ways that you can you can do this but this you can make a unique value for yourself here so just create a unique value and add it in there so once you have those go ahead and save your dot EnV and so after we have our directory all set up I'm just going to quickly show you our package Json So within our package Json we have a handful of dependencies here so we have axios body parser Etc so what you'll need to do is you'll need to go ahead and npm install or npmi all of these things as you see so you can do that all within one line in the terminal so you can just npmi axiospace body dash parser Etc and so on so just make sure these are all installed if you have a typo or forget one it's the the terminal should be pretty clear in in the error if it comes up later on but hopefully hopefully that's not the case so from there I'm going to hop over to our server and we're going to start building this out so I'll just make this full screen for now and I'll make it a bit bigger and I'll also close out our terminal there and then just so we have this all on the screen so as you see here it looks like a lot of steps that we're going to be doing but we're going to go through all this pretty quickly and also I didn't mention off the top but if you do want to have the files for this and just take this Fork this do whatever you want with it I do have a GitHub repository with this you'll see it's a pretty crude example of how you can incorporate an application like this there's lots of things on the margin that can be improved so you can take this and use this as sort of a starting point for integrating this into a service you have or an idea you might have so feel free to use it however you want and hopefully hopefully you find this useful so without further Ado the first thing that we're going to do is we're going to require Express uh so Express if you haven't used it already so it's our web app framework so we're going to be using this to set up our endpoints we're also going to be using this to serve our static files for the front end so that index HTML the style CSS and the app.js that's going to all be served through Express here then body parser so body parser is used for parsing incoming post requests so this is just so we'll be able to actually read that on the back end without any complications it's often something that you you set up and you don't realize once you until you actually make a post request and you realize oh shoot I need body parser as well so malter is what we're going to be using for handling our file uploads so because the user is dragging into our interface we're going to also upload that MP3 or set audio file to the server so we're going to be using malter to accomplish that then we're going to be using Express sessions so we're going to be using Express session for the file pass so as you might imagine you say if there's multiple people using your application you'll want different sessions so that different files aren't being referenced for different tasks and there's a handful of other reasons but we're going to be using a session Express session rather for handling the sessions then we're going to be using dot EnV and we're just going to initialize it with DOT config so this will allow us to reach into our private keys and our environment variables simply by in initializing and requiring this Library here so from there we're going to start setting up stripe so this is where we reference the stripe private key so if you still have an open keep your stripe tab open in a Chrome tab because we're also going to circle back and grab our public key once we get to the front end of the application so then we're going to require path so path is what we're going to use to work with our local file path it will just make things a lot easier for us and then if you haven't used Express before so the way that we create an Express instance is this is sort of standard convention is we'll have our app variable and then we'll have that equal Express and then we're going to be using axios so you don't have to use axios in this I I've just over the years gotten familiar and comfortable with using axios and I quite like it as a a library so this is what we're going to be using for API requests then form data we're going to be using a form data to actually send the mp3 or audio file to the openai whisper API so there is a certain way that the open AI API handles files and this is what I found most useful or easiest to use for for sending the the audio files to the whisper endpoint so once we have that we're going to finally require FS to read and write files from our our system so the first thing that we're going to do once we have everything required is we're going to go ahead and set up malter so we're just going to set it up and reference the uploads path for where we're going to actually store our audio files then from there we're going to set up the file names and we're going to set it up in this instance with unique timestamps and the original names so just so that it has a likely unique name that we'll be able to reference then from there we're going to initialize malter and then we're just going to go through and do some sort of housekeeping necessary tasks just to set up Express and set up body parser and whatnot so this is going to be referencing our public front end of the application so we'll just be able to serve it up right to our Local Host then we're going to use these to parse our post requests like I mentioned then once we have that set up we're just going to set up the express session middleware so we can do that just like you see here and this is where we're going to be referencing that session secret that we had set up in our DOT EnV y so once we have that set up I'll just scroll down here we're going to create a transcribe audio function and the first thing that we're going to do is we're going to set up a variable for our openai API key we're going to reference which model of whisper that we're going to use in this example so in this one we're going to be using whisper dash one and we're going to the from there create a form data object and append it to the required data so this is where we're actually going to be leveraging that library that we required above and we're going to start building out that request that we're going to ultimately send to the openai API and then this it's a bit longer here but um I'll just run through it quickly so this is the actual endpoint for whisper this is where we'll effectively append our options or our config so we'll be sending our file we'll be sending the model and then from there we're going to just set up some of the headers that we need to set up for the API and then just standard we're going to wait for the data to come back we're going to throw a catch if there's any error and then we're just going to resolve or reject the promise from there so the function will return a promise so we'll just wait and hope that it is successful once the audio file is actually sent so from there we're going to set up our upload endpoint so in this we're going to specify that it's just going to be a single file that we're going to upload it'd be interesting if someone wanted to try and build out this application where it could accommodate a handful of files and return it in you know maybe break out that Dom area into multiple sections or create some buttons or an interface to download various files um but I digress a little bit so here this is where we're going to actually be starting to reference our session so here we're going to be referencing the file path within our session and then from there we're just going to send back a response to the front end whether that file was successfully uploaded or not and then if it wasn't uploaded we're just going to send back a message saying that there was an error in uploading the file so from there we're going to set up our charge endpoint so this is going to be when that user clicks that pay and transcribe button on the front end it's going to call our charge endpoint and we're going to destructure a few things that we're ultimately going to be sending to the stripe API here we're going to you'll see this as a theme as I go out try and continue with best practices and handling our errors in case anything comes up as we go through this so just set up a simple try catch and we're going to charge the stripe API with the amount currency and source from there we're going to send whether the payment was successful or not so stripe is super quick um with a response whether that credit card is accepted or rejected with a payment and then similarly if it's rejected we'll just send back that the payment has been have been rejected or failed so from there we're going to set up our transcription or transcribe endpoint so this is where we're going to actually be transcribing the audio file so sort of once the payments successful go ahead and transcribe it is sort of the the logic that we're going to have implemented so again set up our try catch we'll say okay if there's a file path in the session continue on otherwise send back a message that there isn't a file path hopefully that we never actually get this error though so from there we're going to await our transcription of the audio and we're going to send a success message back with the actual transcription so this transcription key and value is what we appended to the Dom and then also the value that we had created in the text file then finally from there we're just gonna send back an error if there's a transcription that way we can see what's going on from the front end if any errors come up and then from there I'm going to set it up on Port 3000 if you have something running on Port 3000 I know it's a popular Port just feel free to change this as you see fit but I'll just leave it for now and then from there we're just going to actually set up and start our server so this is sort of a convention for starting an Express server you'll just say Okay server is running at this port the nice thing is once you actually start this in your terminal you'll be able just to click this in your terminal similar in UC vs code here you'll just command click and then you can open that port and then that does it for our server so once we have our server done I'm going to just quickly before I go into the index.html OR app.js hop into the style CSS so I have some Styles here but I'm going to make the assumption that the vast majority of people watching this aren't here for the styling so I'm not going to go through the styling here but if you want it and use this as a boilerplate for whatever you're doing just reach for the GitHub repository feel free to use this whole repository as a starting point or just reach for the Styles if you're following along with building out the code so from there I'm going to hop over to our index.html so the the HTML is going to be pretty straightforward and it's going to be broken up in a handful of sections so like you see here we have it really just broken out into different steps so there's going to be a little bit more to it than these three steps but largely this is where most of the logic is going to lie in step one two and three So within our HTML just go ahead and create a basic HTML structure you can do that in vs code by hitting shift exclamation mark and tab with Emmett and it will just give you the basic HTML head body tag and a couple other things here and then from there I'm just going to start including a few of the libraries that we're going to be using so I'm just going to be using Tailwind for this I'm just going to reference the CDN um just like I mentioned it's not really a tutorial on Styles so feel free to use just vanilla CSS or any other library or the the CSS that you use typically to implement this but in this case I'll just use Tailwind to keep it keep it simple so from there we're going to require the front end CDN for stripe in this case I'm going to be using their V3 version of the CDN and then from there we're going to be linking out to two other cdns for the drop zone so the drop zone is this area here where we actually drag and drop that file so rather than coding out all of that from scratch I just reach for that drop zone library to be able to accomplish that so from there we're going to just reference our style CSS like I just went through and then we're going to just start going into our application so the first thing that we're going to do is we're going to just build out some utility classes so we have our H1 if you're not familiar with Tailwind these are the type of classes that you are going to be able to add within your HTML to get a lot of that styling for free and then from there we have this section that we see in the GUI here so the first thing that we're going to go through is we're going to first actually not forget to install our script for our local app.js which we'll go through next so the first thing that we're going to do is go into step one with our upload So within our upload you see we have a simple form here and then we're going to be adding the Drop Zone ID within a div here and you can specify the message and change this if you want but this is sort of the core Drop Zone sort of the basic get you going sort of example here and then from there we're going to head down to our pay section of the HTML So within the pay section this is just an example that you can grab from stripe with the added classes from Tailwind so you're able just to link to essentially just the credit card information that you need so it's pretty straightforward you just need the card numbers the expiration zip code Etc and then you just have to make sure that you tie that in once you actually set up the JavaScript in the app so it's pretty flexible it's not like very opinionated or appending its own styling or or containers and stuff like that so you can you can be pretty flexible with how you integrate this in your app so if you're using stripe you should be able to have it match like pretty much identically the the um the application that you have set up and it's styling so once we have that set up so we have the pay and transcribe button here we also have where we'll be appending the price here um so there's a few uh pieces going on here but just to hop down quickly to the transcription or the transcribify section so this is going to be where we have the loader pop-up it's also going to be the area where it indicates whether the payment's been successful and then finally it's going to actually show the transcription result once the transcription has loaded back so that's it for HTML it's pretty straightforward a lot of it will virtually all of it isn't really going to do a whole lot until we get to the next part so the the next step and the last step is going to be the front end JavaScript of this so the first thing that we're going to do is we're going to initialize the drop zone so I'll just go through break this up a little bit here so the first thing that we're going to do is we're going to set up a new instance of the that drop zone area so we're going to reference it based on that ID that you see here and then from there we're going to specify the endpoint that we're going to upload to So within the server we have our upload endpoint that we're going to actually be uploading the file to so once we have that I'm just going to specify okay it's going to be one file and then here so the accepted files these aren't just random audio files that I put in here so these are the audio files that are accepted from The Whisper API so if you want to change it and say maybe just include certain types or what have you for whatever reason go ahead and change them but just as a starting off point here's all of them that you can use there and similarly for the file size The Whisper API accommodates files up to 25 megabytes so that's why we have that 25 megabyte cap set within Drop Zone there so once we've done that where we have this init function that we can reference within the Drop Zone library and we're going to utilize this callback where when a file is added we're going to log out the file has been added so I have a handful of console logs on the front end for individuals that are following along here if you run into errors at any point I just recommend popping open your console and just seeing whether there's any console logs that's showing any errors or for that matter showing progress if if things are going as they should so from there we have a very simple I just threw threw this together real quick the price is going to be based on the file size so basically just running through this here I'll just go through a couple here so if the file size is under five megabytes here we're going to set the price of the HTML to 50 cents and then the stripe API is going to handle the input in sense so we're just going to specify 50 cents 100 cents or a dollar here or a dollar fifty so if it's under five megabytes it'll be 50 cents if it's between 5 and 10 megabytes it'll be a dollar Etc then and you could get a lot more creative with this you can make it much more Dynamic like if you want to say really have a calculation that's based wholly on the file size and get really creative like if you want to charge you know 68 cents for a file you'll be able to implement your own function here to accomplish the same sort of result if you if you like so from there we're going to handle the success event and we're just going to Simply log it out we're going to say okay if it's successful uploading um log of that response so once we have that set up and I'm just going to pause here so there was a a thought when I was building this out on whether to actually allow a user to upload the file before that payment has has been accepted essentially and I saw a podcast once with one of the founders of Instagram and I don't remember exactly what he said but it was along the lines of one of the tricks that they had early on was they started to upload files and process files before the user even clicked post and submit that way the app felt a bit faster so that was a bit of the design decision in why I architected it like this so it will allow users to upload their files even if they're not paying so but you could change that if you want to you know only start uploading once a payment successful but I just wanted to call that out here so once we have that set up we're going to set up a simple function for transcribing the audio So within the transcribe function we're going to have a fetch request where we're going to await the response and then this is sort of all the logic that we're ultimately going to be adding on to the front end so while we wait for the transcription from our transcribe endpoint um or or once we've awaited it essentially we're going to get the data we're going to return it in our data variable if there's an error we're just going to log out an error but otherwise if it's successful we're going to hide the loader and then we're going to append the message to the Dom like we saw and then finally we're just going to download the transcription so this download transcription we'll get into in just a moment here it's just a quick little quick little function that will build up and then finally we'll have a catch saying if there's any error on the outer scope here actually interacting with the fetching of the transcription so once we have that so this is our simple uh hacky solution to actually download the file there's a handful of ways that we could have done this but just to make a quick example and sort of get the gears turning on how you could potentially use this so it's just sort of emulating that it's clicking an a tag and then emulating as if it's actually downloading that file and that's how we have that transcription that pops up there so a handful of ways to do this but just a quick down and dirty Solution on how you could incorporate that if you wanted then next we're just going to Cache a couple of our elements here and then from there we're going to create our on payment success so if the payment was successful we're going to remove the hidden class and we're going to remove that from our loader so it shows that the transcriptions actually in progress and then once we have that done we're actually going to call our transcribe audio function so the function that we just spoke about a bit earlier there so finally we're just going to have a handful of things uh to do with stripes so this is where we're going to put our public key for stripe this is going to be how we reference our card element and how we're actually going to mount all those fields that we had in our HTML and then from there we're just going to get the payment form information and we're going to on submission we're going to create a stripe token if there's an error we're just going to log it out otherwise if it's successful we're going to call our on payment success function so and just to revisit this I know we're sort of hopping up and down here but so our on payment success will actually call our transcription audio so if I go to our transcription audio and then that's sort of the last step here assuming it's successful so once you have that done you can just open up your terminal you can click node server.js fire it up and you should have your application working so you can test it out with an audio file that you have I'll include the audio file that I had used in this example in the repository otherwise you should be Off to the Races so if you found this useful please like comment subscribe and share otherwise until the next one
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.