
Welcome to this video tutorial on how to build an OpenAI Web Search RAG LLM API with BUN.js. ⚡ In this tutorial, we're going to build a cutting-edge API that uses OpenAI's API. It will also scrape the web using Cheerio and gather search data from Brave Search. The result is an API that performs text processing and returns highly relevant search results. ⚡ 👉 Features: Query rephrasing with OpenAI GPT-3.5 Data scraping using Cheerio Web search via Brave Search Text processing for similarity search Server built with Bun.js LLM similarity search, text splitting etc with Langchain.js 🚀 Quick Start Guide 🚀 1️⃣ Get your API keys: OpenAI: Get API Key Brave Search: Get API Key 2️⃣ Set up your environment variables. 3️⃣ Install required packages with bun install langchain openai cheerio. 4️⃣ Run your server 👤 About DevelopersDigest 👤 For more awesome tutorials and the latest in development, check out our: 🔗 Links: Website: https://www.developersdigest.tech/ Patreon: https://www.patreon.com/DevelopersDigest/ Twitter: https://twitter.com/dev__digest 📁 Repository: https://github.com/developersdigest/OpenAI-Web-Search-RAG-LLM-API-with-BUN.js 👍 Like this video if you found it helpful! Don't forget to subscribe to stay updated with the latest content. 👍
--- type: transcript date: 2023-11-01 youtube_id: oZb_ccbJhPw --- # Transcript: Build a RAG LLM Web Scraping API with BUN.js in 11 Minutes in this video I wanted to quickly show you how you could set up a very quick and easy to implement rag endpoint that goes ahead and scrapes some of the top search results of a query that a user has to the endpoint so all of this is going to be set up with a bun server and what you're going to be able to do is you'll be able to have a post request with all of these different options and depending on what you pass within your post request you'll get the similarity search results as well as the llm results which take in the similarity search results to generate them so I'll go in to this a lot more in depth as I go through this it's only going to be about 100 lines of code to get this all set up I'll also be posting a GitHub repository link to this uh feel free to do whatever you like with this and hopefully it helps uh in some understanding and different approaches on how you could set something like this up so the first thing that we're going to do is you can go ahead and Bun init a project uh so just run within your terminal bun init go through the prompts I'm going to be using JavaScript in this you can use typescript if you want once you have it set up you can go ahead and create a EnV within your EnV we're going to be using the open AI API we're going to be using that for embeddings as well as GPD 3.5 turbo and then once you have that all plugged in just go ahead and make an account on Brave if you don't have one you get 2,000 free uh search queries per month uh that you can use uh in in this implementation to try it out so once you have that set up go within your index.js and we're just going to start to run through it here so the first thing that we're going to do is we're going to import a handful of things so don't worry about these things too too much I'm going to get into them as we actually get into the code here so the first thing that we're going to do is we're going to initialize both openingi and the embeddings Endo then we're going to first just set up a simple bun server so I'm using Port 35 because I have something running on Port 3000 feel free to use whatever open support you have and then what we're going to do is we're going to have our fetch request here and then we're going to be looking for a post request so we're essentially going to just ignore send back a message if it's not a post request uh indicating so so first we're just going to log out a handful of things so uh throughout the application as you see in my terminal here I'm going to be logging out in order of the operations that are being executed so you hopefully understand uh what's going on in the process of everything um going on uh in order and you can go ahead and remove these console logs after the fact if you like but I'll just leave them in to hopefully help demonstrate this so once we have that all set up uh we're going to just extract and destructure some of the messages and we're also going to set some of the defaults if they aren't already passed so things like chunk size uh text overlap uh the uh return the llm results so all of these are different options that you can turn on and off depending on what you're uh likely wanting from your API request so the first thing that we're going to do is we're going to take the input from the message and we're just going to go ahead and ask gbt 3.5 turbo to rephrase it now the reason that we're going to do this is hopefully that we're going to get a better result than what's just ped within the raw message so imagine this is a chat application and someone says what is the news with maybe some other information this function here uh we're essentially asking the llm just give us a better uh query for a search engine so this is sort of optional you don't necessarily need this but this is just uh an extra step that you can have in there that hopefully will help this perform a little bit better now next we're going to Define our search engine function so we're going to log out that we're initializing the process we're we're going to set up our loader for brave then we're going to call and await the rephrased message so it's not going to continue on until this is received and then what we're going to do is we're going to go ahead and specify from Brave the number of results that we want and pass in the rephrase message that we got from GPT 3.5 turbo so once we have that we're just going to go ahead and normalize the data so I went ahead and excluded anything that included brave.com the reason for that is when I asked for uh news results it was returning results from Brave and that wasn't something that I necessarily wanted so this is just an example of how you could do something like that if you wanted to exclude some other links so then we're just going to parse through the data so there's a number of things within the response uh that you can go ahead and use if you'd like in our case we're just going to be using the link and the title so once we've go gone ahead and normalize the data we're going to go ahead head and fetch the page content so the the fetch page content all that we're going to be doing is we're going to be making a simple get request to the page and the one thing to note with this is if the website is using something like uh a modern framework there is the possibility that it's not going to parse because you might need to use something like Puppeteer to load in that response but we're just going to keep this really lightweight and in the event that it uh has sort of an indication that it is from a modern framework or it's not able to parse for whatever reason we're just going to skip over that result so once we have that we're going to go ahead and just do a quick removal of some of the things that we don't need so because we're likely asking questions that are related to the text on the page we can remove things like script tags style tags the head tag even the nav footer things like that because they're probably not pertinent to the question that's being asked all right so next we're going to go ahead and process and vectorize the content so what we're going to do is we're going to establish the vector count and ensure that we are only doing the number of vectors and pages that are specified from what we pass in to our post request so what we're going to do is we're going to go ahead and fetch the page contents then we're going to wait for each of the page contents to be returned once we have that we're going to go ahead and do just a very crude and simple check to make sure that there's at least 250 characters worth of text you could like even bump this up higher to essentially just ignore those pages because we don't want to use those and we will essentially want to skip over those ones because we know that there are likely not web pages that have content that we're going to uh be able to parse so once we have that we're going to go ahead and break up our text with the recursive character text Splitter from Lang chain so we're going to be able to specify the chunk size from the post requests the chunk overlap and then we'll be splitting up all of that HTML content within it so once we have that we're going to go ahead and from those chunk contents uh we're going to go ahead and store those and create vectors of them and create memory Vector stores for each of them and then we're essentially going to be repeating this process over and over uh until they're all done so once that's done and we've created vectors and embeddings of everything we're going to go ahead and perform our similarity search so we're going to pass in the message as well as the number of results that we want uh to show here so once we have that we're going to go ahead and process all the normalized data then from there we're going to fetch and process our sources so what we're going to do is this is going to be where we actually go ahead and invoke the search engine process and from there we're going to get that last response and we're going to say okay we have all of our sources and we're going to prepare our response here so in our right before our last response here we're essentially preparing the content to respond with um into our llm so we're going to say here's the top results from our similarity search here we're going to stringify all the sources and then based on those similarity search results here's the query respond back ideally in a sentence or two so one thing to note is the llms are going to be the biggest bottleneck in the application so if you remove the rephraser that will help speed this up and then if you try and pass in fewer similarity search results in here as well as asking for something like a brief response this will perform faster so that's why I have responded within a sentence or two so once we've done that we're essentially going to call our chat completion we're going to pass in what we had just declared above here and then we're going to log out that we've sent the results for chat completion finally we're going to set up our response object and then we're going to check the particulars of the post request here and so if the return llm results is within that destructured uh variable above uh it will return it if not it won't and then we're essentially going to log out the final uh message here that we're constructing the response and finally send it out then next uh like I mentioned early on in the video we're only going to be using this on post requests obviously and then we're just going to log out a message so if someone tries to make a get request or something like that we're just going to have a message that logs out finally we're just going to log out that the server is listening on the port once the application is run so once you've actually bun index JS and started the server so from there you can go ahead and query it so if I just query this again what is the news you see that it will run through it will start to extract the page contents it will log out all the different steps here and then you'll get the llm uh result as well as the similarity result so this will give you an idea of what's being passed to the llm in here you see I'm asking just for uh two pages to be scanned and just one similarity result but say if we want five similarity results and then four pages to be scanned and then we could even bump up the text size to let's say 300 you'll see that it will go ahead scan uh a number or four of the pages here and then you'll see that the similarity results are much much longer now the thing to note is there is a context limit obviously and if you're scanning a number of pages you might want to remove something like ideally respond in a sentence or two CU you might be looking for more than that content but like I said if you're looking to do that it will take a little bit longer to respond so just a quick simple one today hopefully you found this useful if you did please like comment share and subscribe otherwise until the next one
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.