
In this video, I will guide you on how to utilize GPT-Crawler to swiftly establish a knowledge base for OpenAI's innovative Custom GPTs. These Custom GPTs are not only customizable and shareable but also hold the potential to be monetized in the near future. By leveraging GPT Crawler, you can effortlessly generate a GPT Knowledge Base for your GPT models. This tool enables you to crawl any website and generate knowledge files, thereby allowing you to create your own custom GPT from one or multiple URLs. Furthermore, we will delve into how this process can be applied to ChatGPT and the upcoming GPT4, enhancing their capabilities and making them more tailored to your needs. Links: https://github.com/BuilderIO/gpt-crawler#readme https://openai.com/blog/introducing-gpts https://chat.openai.com/gpts/editor https://platform.openai.com/playground Support the channel: Patreon: Support me on Patreon at patreon.com/DevelopersDigest Buy Me A Coffee: You can buy me a coffee at buymeacoffee.com/developersdigest Website: Check out my website at developersdigest.tech Github: Follow me on GitHub at github.com/developersdigest Twitter: Follow me on Twitter at twitter.com/dev__digest
--- type: transcript date: 2023-11-23 youtube_id: CFMK_707xqg --- # Transcript: GPT Crawler: Turn Any Website into a Knowledge Base for OpenAI's Custom GPTs in this video I'm going to be showing you how you can set up your own custom GPT that will recursively crawl a URL to create a knowledge base for the chatbot so gpts are a new product that came out during opening eyes Dev day and it's essentially a custom version of chap GPT that you can configure and set up with uh the data you'd like to provide it whether you'd like to use dolly or code interpreter it makes it really easy to set up these chat Bots and you essentially don't really even need any coding to get these up and running now with that said I'll also show you how you can set this up within their API if you're looking to leverage uh this within something you might have already built or if you'd like to have more control than what's given through the gpts interface now the way we're actually going to generate the file that we're going to use as our knowledge base is with this GPT crawler Library so if you head over to the GitHub repository for this I'll put the description or the link in the description rather and you can just pull this down you can get clone it set up a directory in your vs code and we'll be using this to create uh our knowledge file is what you can sort of think of it as now the way to get this up and running is very simple so once you've pulled it down you can either bun install or npm install it might take a moment because it's going to install Puppeteer which is a little bit of a bigger library but once it's all set up the nice thing with this is you're not using any uh open AI uh API for embedding or anything you're just crawling it locally so there's no cost to do this it's relatively quick it's going to be sort of limited in speed depending on how many pages it's crawling so I'm going to show you an example with Lang chain so essentially what I'm going to be doing here I'm going to hit the URL that you see within line five here and I'm going to only crawl the matching URL so essentially anything after the docs here and then I'm going to set the max pages to crawl to a th000 then the output file is where it's going to generate the output so I ran this earlier before the video and just to show you what this looks like so it will generate this Json document where it has the title of the page it has the URL for the page and in this example it has the entire contents of the page now if you want to get a little bit more specific with this you also have a selector option where you can Target specific areas of the page but this is a sort of General catch all where it will just grab all that HTML from the page so once you've set it all up and you've configured it all you have to do is either bun start or mpm start so I'll actually just run it here in the background now the one thing with this is if you do stop at midr run it's not going to give you up until that point it has to complete the run so in the case of Lang chain I think it was 400 and some pages to go through all of their documentation and generate the output file that I have here but say if you just wanted to test this you could set this to 10 or something low and just see how it works on you know a smaller data set if you'd like so I'm just going to go ahead and stop this here since we already have our output file and what you can do is you can go to either chat . open.com gptc discovery that will give you this overall page where it gives some examples of other chat Bots that they've built as sort of boilerplate now if you go within this create a chat or a GPT rather you can click this now you have the option to either build this with natural language so I could even start it and I could say I want to build a chatbot for Lang chain documentation and the neat thing with this is it will essentially do a few things on in the background like might generate a profile picture generate a title for you and sort of do some of the helpfulness that a lot of us use chat GPT for like you know helping out with certain ideas and whatnot so we'll just let this run for just a moment and then once it's set up and you want to do further tweaks or if you don't want to do this natural language setup so I'll just say yes we'll use Lang chain guide so if you don't want to do the natural language setup you can also just hop right into the configuration so I see it's generating a profile photo for us but what you can do here is so it's helpful for that initial uh jump into it but I'd imagine most people will likely want to tweak this a little bit further so here we have this cool little dolly photo that it generated we can let's call this docu chain um you see the instructions I'm not going to read through all of this but you get the general sense here and then it also has these nice sort of prompting questions that you can put in and customize so you can also turn on and off uh some of these features so in this case I don't need web browsing uh code interpreter or Dolly uh since what I'm going to be doing is uh targeting this uh output file directly so to upload the file as you might expect just click upload click that output file so it's going to be generated at the root of the directory um of this uh GPT crawler once it's set up that's pretty much it you can go ahead and start to query your chatbot now the other thing I wanted to point out is if you're looking to do this within the playground one thing that I do want to note is while you can set this up from their API one thing to note is even with their playground is you are going to be incurring costs for the retrieval all the different tools that you're leveraging the model and whatnot so just be mindful of if you're looking to set this up for your own custom uh offering or what have you if you are leveraging it uh with the uh open AI API and their playground just know if you're playing around in here just be mindful of that so there's all the pricing that's well laid out on their pricing page but you're going to be uh a current cost for um retrieving uh documents you're going to be in current cost from gp4 and it it can add up relatively quickly depending on usage and whatnot so just be mindful of that if you're looking to set it up from the API that's pretty much it for this one one last thing I wanted to point out is if you do want to share this with someone is you can send a link directly to them if you want to make it public you'll have to set your uh name public within the offering that you have here and this is going to potentially index it within the coming GPT store so say if you build something really interesting a novel and a lot of people use it you could potentially uh be in for some Revenue sharing from open AI so that's it from this one hopefully you found this useful if you did please like comment share and subscribe and otherwise until the next one
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.