
In this video, I will show you how you can build your own mini front-end developer by having puppeteer parse a webpage and a selector and have GPT-4-V generate working tailwind JSX react/next components from simply a URL and a selector. Repo coming soon!
--- type: transcript date: 2023-11-07 youtube_id: h30sjT4QgFI --- # Transcript: GPT-4-Vision and Puppeteer in this video I'm going to be showing you the new vision multimodal capability within the gp4 API so what I'm going to be setting up here is a small little node project that will allow you to go ahead put in a URL and put in a selector and it will go ahead and generate a jsx file with all the Tailwind classes for a particular component so it's going to be a relatively quick example but it's just going to show you how you can get up and running with using gptv or GPT Vision so the first thing that we're going to do is we're just going to make a new project so I just went ahead and I ran bun init and from there you can declare whether you want to use JavaScript or typescript or what have you once you have that just go ahead and make a EnV within the EnV put in open aore aior key go ahead and grab an API key from platform. open.com and then you should be up and running so once you have all that we're going to go into our index JS and we're just going to go through the steps here so the first thing that we're going to do is we're going to install a handful of dependencies so we're only actually going to be using Puppeteer and open AI so if you just go ahead and Bun install both Puppeteer and open AI so once you have that we're just going to import them like you see here then at the top here I'm going to set up a little configuration object just to make it easier to use and just for demonstration sake so you can uh uh change this if you want uh once you actually get into the and I will mention that I'm going to be putting a GitHub repository link for all this if you want a starting off point for using gptv so first for puppeteer it's very asynchronous so we're going to be awaiting a lot of things so we're just going to wrap it within an asynchronous function here now the first thing that we're going to do is we're going to launch a puppeteer browser and if you haven't used Puppeteer before so it's essentially a chromium instance a synthetic browser that's going to be running in the background and Performing the actions like you would if you were actually using a web browser so there's a lot of really neat things that you're going to be able to do U with Puppeteer if you're interested in diving into it more in this case essentially what we're going to be doing is we're going to be taking screenshots of either components or full pages so I'll just run through these all here so the first thing that we're going to declare is the browser itself so we're going to declare that we're going to uh open it in the Headless new so you can also uh set this up to be headless false if you actually want to see the actions that are being taken place within Puppeteer so we're going to wait for a new page to load we're going to set our view Port so this is like a somewhere in the order of like a 15-inch MacBook Pro uh not with Retina so the thing to note with gptv is the larger the image uh the more tokens that you're going to be using so just to be mindful of that so what we're going to be doing is once this is all set up we have the synthetic browser all loaded up the size that we want we have a new page and whatnot we're going to go to the configuration URL and then we're going to wait for the network to idle so essentially all those xhr requests Network requests that are coming in we're going to wait for those just to stop before it actually takes the subsequent actions here so here we're going to just get our directory all ready so everything's going to be saving out to this website folder in this example so we're going to be saving out both the screenshot that we take as well as the jsx that it generates if any and then we're just going to be setting it up in a way where it's just going to organize them uh stacked one one by one so it's going to have the image and then the jsx file and it's going to be organized based on the timestamp and then the essentially the URL of the website so first we're going to check whether the configuration uh selector actually has a value so it will work if you just want to take a screenshot of the full uh web page um but the thing with a full web page of say it's a long web page and you're trying to take a screenshot with that and you're passing it within gptv the results are going to be you might not get you know as much of of a sort of bank for your buck um if you go ahead and do that but you definitely can so in another video when this first came out in their guey I did some examples on uh taking the Netflix homepage and the Google homepage and it did sort of a decent job at doing sort of the layout so you do a that option with the the setup here so first we're going to wait for that selector to appear on the screen so Puppeteer has a lot of really nice methods uh this being one of them where you can actually wait until that element is visible so with a lot of modern Frameworks and things being rendered um on the fly with a lot of websites like portions of the website might load at different times you can actually wait for particular selector so say if you see something on a website you're like I I like that button I like that input whatever I like that sort of chat area bar whatever you're trying to put in here you can use this method here and actually wait for the selector which is super handy so we're just going to check so if the element exists on the page we're going to go ahead and take a screenshot of the element and we're going to log that out so we're going to be logging out a lot of things just so you can see the process and the steps that we're taking to do all this then if the element wasn't found we're just going to log that out as well then if there isn't a selector within the configuration object there we're just going to go ahead and take a screenshot of the page so after that we're just going to clean up that chromium instense we're going to uh just browser. close here and then for the gptv endpoint so you can pass in a URL so say if you were to upload this to uh like an S3 bucket or something like that you could pass in the URL to open ai's API or alternatively you can do it just like this with base 64 So within node you can just convert it to base 64 and send it just like this so then we're going to initialize our open AI client and then we're just going to log up that we're sending this to the API so I have a few things within the content uh description that I'm passing in and the reason for that when I was initially trying this out and I tried uh sending in a screen of Google and I said recreate this page now the responses I was getting back was I can't do that this copyrighted material etc etc so I just sort of uh massage that message a little bit uh to just try and remove any proprietary or copyrighted content essentially all of the things that it was saying it can't do um you know this is more just for demonstration sake you know I'm not trying to infringe on anyone's copyright uh obviously so um just if you run into that this is a system message that you can pass in and the system message should be weighted pretty high um so it hopefully should listen to that but to make it even more prominent is you can also pass in uh some of the things that you mentioned within the system prompt as well so I'm going to say make a highlevel uh tailwind and xjs component based on the screenshot of this website remove copyrighted material only return valid uh jsx so I said High LEL so in playing around with with this a little bit uh it's not going to go ahead and replace like frontend Developers for instance like it's not going to be able to take in every little bit of a website and render it exactly as you want uh but it does a pretty good job so it will give you sort of a good starting off point and then from there you can go in and make the tweaks that you need so the way that we pass in the image URL so we're just going to specify that it is the type of image URL and then we're going to be passing in that base 64 screenshot right here so I'm going to be putting in a Max token uh output of 300 you can increase this or decrease this as you see fit if you're doing full screens it will probably uh run out um of uh or it will save out before it's actually complete um whereas if you're just using like small buttons or text areas when trying this out it should be fine with 300 tokens so here we're just going to be extracting from the response from open AI everything between that jsx so if you didn't just want jsx so you wanted to generate all sorts of different components uh for you know a front end or backend or application or what have you you can go ahead and make a more sophisticated version of this since this example is just strictly using I'm just asking for jsx I'm just going to be matching solely on the jsx here so if it finds the jsx content uh we're just just going to go ahead and uh make our path for that file and then if that jsx content is within the response of the message we're going to go ahead and save it out otherwise if it doesn't have a valid jsx response we're just going to log out that error here so I'm going to go ahead and save that and I'm just going to pull over a couple websites uh for examples here so if I go right back to the top here so we have Google and we have Nav Now a quick way that you can go in and grab the uh selector that you want so if you just open up the inspector and if you look for a unique class so if you can't find a unique class you might have to use something like nth of type or or what have you um or get a little bit more creative on how you're actually going to be selecting that but in this example so let's say I just want to Target this whole sort of text area box so so if you go ahead and get the class name and then you just do a command F or control F on Windows you can paste in that class and see if it only has that one element so here I just see there's that one class on the page so this should be safe to be able to use for my selector so you'll put it in a selector just like you would using like a CSS or you know document query selector and we can go ahead and save that out here so if we just go ahead and Bun index JS we'll run that we'll see okay we have a screenshot of the page and then pretty quickly we have the initial jsx for our little component here so we have it's even called search component we see all the familiar Tailwind classes here and yeah so you can go around and play with this uh uh see see what you can do with it but I just thought this would be sort of an a quick and interesting demonstration on just one of the many uh different possibilities that you can use with gptv so I'm going to be covering the rest of the week as well as the next few weeks uh a lot of new content uh technical tutorials in node and Bun and also nextjs on how you can leverage a lot of the new features within open ai's uh announcements that they had yesterday so if you're interested in this type of content please like comment share and subscribe and otherwise I'll see you in the next one
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.