
In this video, I'll show you how to set up internet-enabled responses from LLMs using Serper, Firecrawl with dynamic model routing. We'll utilize a model router called Not Diamond to dynamically route queries to different models, including Anthropic, OpenAI, and Gemini. You'll learn to scrape web pages, handle embeddings with Langchain and OpenAI, and configure various APIs. I'll walk you through the entire setup process, including installing dependencies and creating the necessary routes and functions in TypeScript. By the end, you'll be able to create a flexible, context-aware LLM response system. Repo: https://github.com/developersdigest/internet-enabled-llms Links: https://www.firecrawl.dev/ https://serper.dev/ https://www.notdiamond.ai/ https://ai.google.dev/gemini-api https://www.anthropic.com/api https://openai.com/index/openai-api/ 00:00 Introduction to Internet-Enabled Responses 00:04 Setting Up the Tools 00:15 Model Routing with Not Diamond 00:26 Example Query: Chat GPT Canvas 00:51 Skipping Embeddings for Full Context 01:49 API Keys and Configuration 02:30 Installing Dependencies 03:17 Setting Up the API Route 04:07 Search Functionality with Serper 05:07 Scraping with Firecrawl 06:46 Embedding Setup and Optimization 07:53 Generating LLM Responses 08:25 Final Steps and Error Handling 10:00 Conclusion and Thanks
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
--- type: transcript date: 2024-10-31 youtube_id: tnyAaX3Pr_g --- # Transcript: Firecrawl for Internet-Enabled LLM Responses with Model Routing in this video I'm going to be showing you how to set up internet enabled responses from llms what we're going to be using for this is serer fir crawl we're going to optionally have open AI as well as Lang chain for embeddings then we're going to be using a model router called not diamond and depending on the query that you send in it's going to Route dynamically to different models in this case I have some models from anthropic opening ey as well as Gemini let's just try this out when did chat GPT canvas come out we're going to specify we're going to scrape three pages is we're not going to be skipping the embedding step in this example and what this will do is it will give us a response back of chat GPT canvas was launched on October 3rd 2024 we see that it roted to GPT 40 mini based on the context that it was sent from what was scraped and then also the similarity results from the embeddings now why I set this up the way that I did is sometimes you actually want all of the contacts that might be on a website so if I say skip embeddings to true what that will do is it's going to send the payload of the entire markdown that we scraped through fir craw into the llm for a response the use case for this is say if context is really important and I say something like give me an example of how to set up fir crawl and nodejs if I send that through what that's going to do is we're going to get the top three results we're going to skip that entire embed portion now the time to First token might be a little bit longer for this cuz we are sending in the entire page of all of those different pages that we scraped but if we look at the answer it's giving us instructions that at a glance look correct now where this can be useful is if I ask how to set up fir crawl in node.js this is chat gpt's response where it's not giving me actually what I want what I wanted was that example that is using mendable which is the correct way on how to set this up I'm going to be walking through the end point in terms of how to set this up now all of the different API keys that we'll need to set this up is we're going to need a search engine API set up with serper we're going to need an API key for fir craw we're going to need an API key for open AI which we're using for embeddings and then it's set up in a way where you could change it where if you just want to use one of the providers say if you want to use opening eye as well as gbd4 mini through to the 01 series of models you could do that or if you want to use say anthropic and hi cou through Sonet and Opus you could use that or you can use all of the different models and all of the different providers that you'd like you can really have a bit of flexibility and you'll see in the example on how you can select the different models that you choose to route between if you're pulling down the repo you will be able to just bun in install everything alternatively if you're going through this step by step you can just bun create next app create it within the root of your directory and then once you're within the directory we're going to be installing three different things we're going to be installing fir craw we're going to be installing Lang chain and then finally we're going to be installing not Diamond as well from there just go ahead and add in your API key fire craw serer not diamond and open AI like I mentioned if you just wanted to use something like the open AI API key you could use that as well and then just specify the different models that you want to Route towards with ad openai series of models once you have those removed just make sure you remove the do example from the repo once we have our packages installed and our environment variable set up we can go with an app and we're just going to set up a route with an API llm and then we're going to make a file called route. TS the first thing that we're going to do within our route is we're going to import all of our different dependencies fir crawl Lang chain as well as not diamond from there we're just going to set up our environment variables we're just going to make sure that we do a quick check to make sure that we have all of them set up if you just want to use open AI you will be able to remove and do a little bit of configuring where you could just swap out Google as well as anthropic and just use open AI or some sort of combination of the ones that you'd like to Route towards once we do that we're going to initialize all of the different clients that we're going to be using fir crawl we're going to be initializing not diamond and then finally we're going to be setting up those embeddings apis for if we decide to use embeddings within our response from there we're just going to set up a handful of typescript interfaces which will become clear on what they're doing as we go through the rest of the code here first we're going to be setting up a function that will handle our search functionality so in this case we're going to be using serer I'll also put a link within the description of if you want to use different providers whether it's Google or whether you want to use duck. go you can really use whatever you but essentially how it works is we're going to be sending in the message here and alternatively what you could do within here is you could further optimize this potentially if you want to put an llm in front of your search engine API to optimize the query for a search engine you could imagine if you send in a ton of different context it might not be the best query depending on what you're doing but if you're just asking questions and stuff like this this should work pretty well so basically all it is we're going to be sending a post request with our API key of the message that we have within the query of our API if we have any errors we're going to log those out alternatively we're going to send back all of that response data once we have that we're just going to be normalizing the data a little bit and the main piece of this are going to be the links that we're going to be sending within the next step to fir crawl from here we're going to be setting up our function for fir crawl within this we're going to be passing in that payload that we got from our serper API and then we're going to map through all of the different links that we had returned right now at time of recording there are some links that aren't supported so when I was setting this up I did run into some issues where it would try and scrape social media sites I just made sure to exclude all of those because for the most part those aren't really relevant to the types of questions that I think are probably going to be asked once we that we're just going to make sure all of the different links that we got from our Ser API don't match any of the above from there we're just going to filter out all of those different links and then if for whatever reason there are no valid URLs say if you're asking something really specific about Facebook or what have you and all of the search engine results are from those links we are just going to send back a message to the client from there what we're going to do is we're going to send all of those filtered URLs to the new badge scrape capability within fir craw and what we're going to be doing is we're going to be that we want the entire page effect effectively and we want it in the format of markdown and the reason why markdown is helpful is because it's a succinct block that we can send to the llm and it makes it really easy to send in that entire context with without having additional wasted tokens that could be within HTML for instance so and then similarly to above if there are any errors we are going to be sending that back to the clients and then finally what we're going to be doing is we're going to be mapping through all of the different results that we got from fir craw you do get the content and then you also get a little bit of metadata for the page in this case we're going to be return in the URL the title as well as the content which is effectively you can think of it as the entirety of the page next we're going to be setting up our embeddings and this is essentially that optional reg functionality where it can be useful if you don't want to pass in the entire context of all of the page and you might have to do this depending on what llms you decide to use what this is going to do is we're going to have a recursive character text splitter all that this is basically going to do is it's going to go through all of the contents that we had scraped and then we're going to send it in into open AI to create an embeddings for us from there we're going to be storing those vectors in memory and then we're going to immediately query that in the process of this API request and this will allow us to send a smaller payload to the llm which will allow us to save on cost as well as potentially latency for things like the time to First token next what we're going to do is we're just going to have a simple function that all this is going to do is we're going to concatenate all of the different pages contents together and this is going to allow us to have that nice succinct text block of all of the context that we're passing in we're going to say here's the page title the URL as well as the content and it's going to Loop through all of those one by one once we have that we're going to generate our llm response now the way that not Diamond works is we're going to be able to specify the different llm providers here here I have opening eyes GPD 40 mini I have anthropic I have Google then you can specify the tradeoff you have the option of cost latency or it will just default to the quality of the response the one thing to not with this is you're not actually sending any API Keys through not Diamond it's going to return the response of what it thinks the best model for the query that you send in and then it's going to generate a response on our side within our client code to get that response for us once we have that we're going to be setting up our post requests first we're going to set up a simple try catch within here we're going to to structure three different things that we're going to get from the page so the message the number of pages to crawl as well to whether or not skip the embeddings process first we're just going to make sure that we do have a valid message from the query that's the one thing that we do have to make sure that we get from there we're just going to log out all of the different messages so while we're setting this up you have something within your server logs or on your endpoint from there we're just going to validate the pages to crawl we're just going to make sure that it's within that range from here which is going to invoke the process that we had just set up in the previous examples so we're going to query our search engine API we're going to scrape those URLs then once we have all of those results back we're going to see what whether the user specified to skip the embed or not here is where you can also specify the chunk size and this is the number of characters that we're passing into the vector Dimensions you can play around with these numbers as well if you'd like this is the number of similarity results a good Next Step potentially is you could put these within the API requests if you want to have additional control of the number of similarity results as well as the chunk size and overlap once we have that we're just going to log out the generated content and then finally we're going to generate the response and make sure it is valid before we prepare the response and ultimately send that bag with the payload of the answer the selected model from not Diamond as well as the crawl information so the actual pages that were scraped as well as whether it was specified to use and better or not and then finally we're going to send that back to the client and if we have any errors we're just going to log those out that's pretty much it for this video I wanted to thank fire craw for partnering on this video I'm going to put all of the links to everything within the description of the video but if you found this video useful please comment share and subscribe otherwise until the next one
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.