
In this video I explore the new ChatGPT Image Capabilities that have begun to role out to ChatGPT Plus subscribers over the past week. I will demonstrate how well the GPT-4 multimodal vision capability can recreate websites from screen shots of iconic web sites such as Google and Netflix. I will also see how well how well the model performs in creating a website out of a napkin sketch. I will demonstrate how well it builds Tailwind and Next.JS JSX. As well as plain HTML, CSS and Javascript. https://www.patreon.com/DevelopersDigest
--- type: transcript date: 2023-10-04 youtube_id: i4CMX6tsGc0 --- # Transcript: ChatGPT Can Now See... Can It Build a Website? right in this video I'm going to be showing you the new chat GPT image capabilities so the image capabilities were originally announced when gp4 came out and they showed a number of examples of different screenshots as well as a napkin sketch in generating a functioning website so I'm going to take a similar approach here and see how it actually works in the wild with a couple examples that I throw at it so in addition to the release of the vision functionality they did release the ability to speak with chat GPT and have a talk back to you as well within their iPhone and Android app so I'm going to be demonstrating that in a future video but in this one I'm going to be focused solely on the image portion of it so in the blog post they mentioned that image understanding is powered by multimodal gbt 3.5 and gbd4 but at time of recording I only have access to the image functionality in gbd4 so I'm only going to be using gbd4 in the examples I throw at it so I'm going to throw three different examples at it I'm going to ask it to generate me a front-end version of the iconic Google site that we all know and have all used then next I'm going to have it try and generate the landing page for Netflix which again is pretty iconic not as iconic as Google obviously and then finally I'm going to throw uh the equivalent of a uh napkin drawing at it so I just had a simple navigation bar developers dig say contact YouTube recent videos and a bit of a slider and I just wanted to pass this to it and see what it would do so I'm going to take uh three different uh images here to generate and I'm also going to take a couple different uh text Stacks to implement them so I'm going to try one with nextjs and tailwind and then I'll also try uh some with just raw index like a raw index HTML CSS and JavaScript so uh what I'll do here so the first example I'll have it generate so I'll just drag over the Google screenshot here and I'll say generate me a jsx um or let's say generate me jsx and Tailwind to recreate this so I gave it pretty broad instructions kept it very simple we'll see what it comes back with now because we're using gbd4 and not something like gpt3 .5 turbo it will take a moment uh to run through so while that's uh running through and generating for us uh we'll just go ahead and create a next app here I'll just keep all the basic settings have it install for us we'll CD into our app and then we'll just find our directory and the page that we're going to be putting the code in once it's finished now the other thing that I wanted to talk about while this is running through is I think this will become particularly interesting when they enable the ability for uploading images to their API so obviously it is useful and Powerful to have it within Chad GPT but being able to unlock something like this for developers will create an abundance of use cases as you could imagine so imagine if you had an idea for incorporating you know the multimodal capability into your application being able to actually ping the openai API and pass it an image and have it return code or natural language goes without saying how useful something like that would be and they do have an upcoming developer day on November 6th and I wouldn't be surprised if that functionality gets announced of being able to use the multimodal capability from their API uh during this conference so we see here we still have our Google Google homepage component uh being rendered here uh while that is being rendered we might as well go ahead and just uh run our server here so I'll just actually say bun uh Dev run or Bun Run Dev rather and so now we have our Local Host 31 up here so we'll have that compile and now we should have our component here so I'm just going to grab it from the return statement and I'm going to replace everything within the return statement of our application and you see we just have the boiler plate for nextjs here now if I go ahead and save that look at that so one shot one line it was able to generate not exactly what was there but it was able to generate pretty close the iconic land Landing page that we're all familiar with here so right off the bat sort of kudos to the implementation here um now the next one that I'm going to try here is the Netflix homepage so I'll say now generate me tailwind and jsx for this so the thing I am curious about is how well GPD 3.5 will perform compared to GPD 4 now because GPD 4 is obviously their more powerful model I am going to be curious to see how well the GPT 3.5 model performs like is it going to be able to create something like that in one go or will it be something where it can create maybe a wireframe or what are the sort of limitations between the two so once that functionality is released to me I will do a bit of a comparison um and and show you the differences between their different models in the in the realm of the vision capability in particular so the other thing that I wanted to show you is now if you wanted to find a piece of uh let's say a component on a website you know let's say you like the look of this sort of Block Level section here on Netflix now a way that you can actually easily take a screenshot so you can take a screenshot of your whole screen and then crop it out obviously but a nice little trick that I wanted to show you is if you open up the inspector and then you find the wrapper of what you want to take a screenshot of you can just simply rightclick on the Dom element and you can capture a screenshot there so you can see I have uh a number of different screenshots here and then we'll just show you the image of that section there and this is a thought that I had is if you try and throw too much at it obviously it's going to have a harder time given that there is a context uh window and length that you have to contend with so if you're trying to generate say this whole Netflix site it's going to have a hard time generating this homepage here but potentially if you feed it bit by bit you know a a block level you know section by a block level section or just like you know a particular button you know you can get sort of as granular as you want but I would imagine that the smaller the example that you feed it the better that it will perform so now we'll hop back to our Netflix example here we'll again we'll take the return statement just like we did in our Google example I'll go ahead and close out this Netflix example I'll replace our return statement here and there we have it so we obviously don't have the background image here but if again we compare it just like we did to the Google gole image if I go ahead and just drag this up here and put this side by side we see that we have the general uh implementation of what was there so if I take a look here there's not actually functioning code or anything there's not hooks that are bound and showing you the drop downs or anything like that it didn't get all the colors right which oh it's sort of interesting so it got the red right here on get started but it didn't get it here on the signin functionality so sort of interesting but overall considering I just threw this screenshot at it I'd say that it does a pretty good job so last but not least I'm going to throw my uh basic sketch into it and I'm going to say generate me an index HTML with HTML CSS and JavaScript of this drawing so the thing that I found in running up to creating this video so I did try both index uh HTML examples and tailwind and nextjs examples so what I found is the nextjs examples when you're using Tailwind classes is it is what I found to be the fastest response that you'll get back so now even though gp4 does take a minute or two compared to GPT 3.5 turbo um when you're using something like this and obviously when you're asking for all of the CSS and not leveraging a library like Tailwind uh it's going to take a little bit more time to write out all those Styles and whatnot for us but if we just take a look at what it's generating um with the image here now the one thing that I noticed noticed is so I have developers digest as the title so it uh took developer digest which is interesting and if I look through the styling they're pretty broad Styles um it's going to be interesting to see how well this uh is applied once it writes out the HTML and so it's going through it's creating a header for us again it had that minor mistake where it's it's not picking this up but again I have terrible handwriting I'll be the first one to admit that um but what we see here which is interesting that's different than the other examples and if I just revisit what I asked for so I asked for the representation of the drawing with HTML CSS and JavaScript and what I see here as it's putting out the response is it's actually putting out these uh function invocations on the onclick of the arrows here so it does appear that it might try and Implement and we we see here it's just implementing sort of that initial functionality so it's putting in the onclick but you know I'd probably have to ask for a subsequent response I could say you know use a selck slider or something like that but in this example I'm just going to show you the first response show you what that looks like I'm going to paste that within the index uh HTML here I'm going to start a live server here and we see we have a simple but functioning sort of website right so we'd have to go in and tweak this a little bit obviously this is much more rudimentary but when we compare it to the other examples um you know very basic but sort of a neat you know uh neat trick that you can go ahead and get some you know initial wireframing of a website if you're interested in so I'd say of these three examples the Google example was probably the one that was most spoton now if you'd like to see other examples with the chat GPT Vision functionality let me know in the comments below I plan to make another video on this like I mentioned once I have access to the gbt 3.5 multimodal capability but otherwise if you found this video useful please like comment share and subscribe and becoming a patreon subscriber as well otherwise until the next one
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.