
In this video, I will demonstrate the new open-source Screenshot-to-Code project, which enables you to upload a simple photo, be it a full webpage or a basic UI component, and it will generate the HTML with Tailwind classes to render the output. I will show you how to set this up locally and demonstrate some of the outputs it generated, such as the Wikipedia page. Additionally, I will provide a quick example of how you can take a photo of something like the ChatGPT input area and tweak the UI result's output if further refinements are needed. Links: https://github.com/abi/screenshot-to-code https://www.python.org/downloads/ https://python-poetry.org/docs/ https://platform.openai.com/api-keys http://localhost:5173/ Clipboard tips and tricks: Windows 10 and later: Use Windows + Shift + S. This activates the Snip & Sketch tool, allowing you to drag and select a portion of your screen. The selected area is then copied to the clipboard automatically. macOS: Press Command + Shift + 4. This changes your cursor to a crosshair, letting you select a portion of the screen. Hold Control while dragging to copy the selected area to the clipboard instead of saving it as a file. Linux (with GNOME desktop environment): Use Shift + PrintScreen. This allows you to select an area of the screen to capture. The screenshot is then copied to the clipboard. Support the channel: Patreon: Support me on Patreon at patreon.com/DevelopersDigest Buy Me A Coffee: You can buy me a coffee at buymeacoffee.com/developersdigest Website: Check out my website at developersdigest.tech Github: Follow me on GitHub at github.com/developersdigest Twitter: Follow me on Twitter at twitter.com/dev__digest
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
--- type: transcript date: 2023-11-26 youtube_id: kDvbiQ7_Unc --- # Transcript: GPT-4-Vision: Convert Screenshots to Code Instantly in this video I'm going to be showing you screenshot to code which is a new open source project that leverages the gp4 vision API to send in a screenshot or a photo of something that you'd like to create and it will generate the HTML and Tailwind classes for whatever you're looking to create so I'll just show you the short demonstration that they have here on the GitHub repository and as you see here it will start to render out all that HTML for you as the response begins to come back from the AI endpoint so to get this all set up you will need python installed so I'd imagine most people likely watching the channel already have this installed but if you don't or you're getting errors this is the place to potentially start out looking if you know you have python all installed and set up the other thing that you're going to have to install is this poetry library for this to work so once you have those all installed what you can do is you can go ahead and pull down the repo and there's a very straightforward getting started so the way that it's set up you have both your back end and your front end within the repository but they are separated so you'll first have to go in grab your API key from openai so I'm just going to close a couple tabs here so you can just go to platform. open.com API Keys create a new API key and then when you go and run these commands just make sure that the SK your key here is within this line also make sure that you're within the backend directory so assuming you have poetry installed you can then go ahead and run the server here so then the other thing to note is since they are separated between the back end and the front end I'll just show you what I do within my VSS code so here I have two terminals that are split side by side so I have my back end on the left hand side here and then I have or rather I have my front end on the left hand side here and then I have my back end on the right hand side here so this just allows me to have them both within the terminal you can toggle back and forth here and then to split them you can just click on the terminal and click split terminal so once you're all installed and everything you can go ahead and make sure they're running so the Poetry run command and then also the yarn Dev command to get the front end started and once you have it all loaded up you can go to Local Host and you can see um the screenshot to code front end interface here so the one thing to note is I just wanted to touch on this before I forget so in terms of cost of usage now when you saw that initial demonstration here you see the sort of animation of it going back and forth and it looks like okay this could be doing a lot now the output for the Wikipedia page here that I tested on this was 12 cents so just to give you an idea in terms of how much this costs um you can also go back and make iterations to this to generate this page this was one shot so I didn't send in subsequent prompts or anything like that this was 1 go 12 cents is what this cost so just to sort of give you a general sense so I'll just close out a couple other tabs here now one thing I wanted to touch on for all users so I'm using a Mac but if you're using Windows or Linux or what have you here are the different commands that I found useful to take a screenshot and have it within my clipboard so I assume most people know how to take a screenshot um but you might have it saved to a particular directory and then you have to go and find the file Etc what I found useful with this is on Mac so if I just demonstrate it here if I just command Shift 4 then I have this little cursor that pops up that you see on my screen here I'll just move it to the left hand side here and then what you do is if you hold control while you're actually going and stretching over what you want to take a screenshot of so I'll just demonstrate with that text there that will actually take that and copy it to the clipboard so it's sort of convenient if you just want to say you know you see like a button or I don't know like an input that you like like let's say we like this uh chat GPT um interface here so if I just pull that up and then if I just hold control and let's say I just want this input bar here I can go ahead and take a screenshot of that so the Wikipedia page is what I showed you before there so I just simply dragged that across the screen and then once it was in my clipboard it pasted in so this is what it actually went ahead and generated this is the raw HTML file so everything everything is within one file for this initial repository so it's you're going to be able to generate everything within that uh index HTML but say if you're you want to make a react component or whatever you'll have to go in and either tweak this Library itself and generate those things now the one thing that you're going to lose or have to do a fair bit of work to uh set up something like react is you you're going to have to plug in a compiler and whatnot to actually have it render so I think that's why you see a lot of these plain HTML uh demonstrations CU it's really really quick to show the the demonstration of what's loading uh without having to compile things like as things are streaming back and whatnot so if I just go to uh refresh the page here now from the other example I had my uh input from chat GPT so if I just go ahead and on Mac I'm going to paste it in so just command V here I have just the input box so that's the other thing with this is even though in the demonstrations and some of the examples it showed these larger web pages you can also have smaller examples here so I just put in that little one uh and it's sort of breaking out so you can also instruct it a little bit further and say the icons are breaking out of the input bar and then if we actually look at the code here here you can see okay we're using a font awesome let's just send that in so you can also send in the screenshot again so for subsequent requests I believe what it's doing is it's just sending in this within the context window but if you want to also so it does a really terrible job and you just want to say Hey try try again you can uh go ahead and toggle this on but in examples like this if I just go ahead and try this let's just see how this does and now we see so it's within that box so even though if it's not perfectly correct you can sort of iterate from there so it's sort of similar to vel's new uh product that's I think it's still under beta so I think it's a vo where you can generate these components on the fly with natural language so pretty neat stuff I expect a lot of these types of projects to start to come out over the coming weeks and months espe especially with GPT for vision now publicly accessible but that's it for this video I just wanted to give you a quick demonstration of this screenshot to code project pretty neat I encourage you to go to the repo St the repo shout out to all the contributors here excellent work and that's it for this one so if you found this video useful please like comment share and subscribe and otherwise until the next one
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.