
In this video, I walk you through the step-by-step process of creating your own answer engine, much like Perplexity, but utilizing cutting-edge technologies including Groq, Mistral AI's Mixtral 8X7B, LangChain, OpenAI Embeddings & the Brave Search API. This tutorial is designed for those interested in implementing such a system within a JavaScript or Node.js framework. I show you how to configure the engine to deliver not just answers but also sources and potential follow-up questions in response to queries. The journey begins with the initial setup of our project, where I guide you through managing API keys from OpenAI, Groq, and the Brave Search API. From there, we move on to initializing an express server to handle incoming requests effectively. I place a strong emphasis on the importance of speed in our inference processes and share insights on optimizing various components like the embeddings model, how we handle search engine requests, the method of text chunking, and the intricacies of processing queries. As we progress, I demonstrate how to curate response content meticulously, introduce streaming for more dynamic answers, and how we can automate the generation of insightful follow-up questions. The tutorial rounds off with the final touches needed to get our server up and running smoothly. For those eager to dive in and start experimenting on your own, I'll be providing a link to download the entire repository from the video description soon. This is your chance to get hands-on experience and truly understand the ins and outs of building an advanced answer engine. And if you find this video helpful, don't forget to support the channel by subscribing and sharing it with others who might benefit from this tutorial. Stay tuned for more updates and happy coding!
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
--- type: transcript date: 2024-03-07 youtube_id: 43ZCeBTcsS8 --- # Transcript: Build a Perplexity-Inspired Answer Engine Using Groq, Mixtral, Langchain, Brave & OpenAI in 10 Min in this video I'm going to be showing you how to build out your own answer engine which is similar to the perplexity style answer Engine with grock mixl and Lang chain if you're not familiar perplexity the way that it works is if you send a query through their interface here what you'll get is first you'll get sources back then you'll get an answer and then once the answer is complete you'll also get some follow-up questions which you can click on and further interact with it to get subsequent answers I'm going to be setting this up within a JavaScript or node.js environment so you'll be able to have something that you can deploy to an endpoint you'll be able to see powerful the new grock lpus are at generating this inference speed I built something similar to this a number of months ago but one of the bottlenecks that I ran into previous to using Gro was the inference speed so my best guess on how something like perplexity is set up is you have an inference for generating a search query for a search engine API for the sources here then you have the answer itself which gets streamed out from the llm and then finally you have the related follow-up questions here so I'm going to show you how you can set this up where you'll be able to return the sources you'll be able to get that answer answer and then you'll also be able to have those follow-up questions so effectively those three main components within perplexity now the thing with this I really tried to focus on the speed of it so I'll just show you once again how this works so I'm going to say tell me about anthropics CLA 3 essentially what we're doing to set this up is with the first message we're sending that to the groc endpoint and we're asking to rephrase it once we have that we're going to get all of the different sources and you'll be able to specify within the endpoint how many sources you want in this case I set it to three and then once once we have those sources what we're going to do is we're going to scrape all of the text contents from there once we've broken them up we're going to embed them we're going to query them and then we're going to be passing those top results with all of that information from those web pages to Gro to respond back to us within a stream here once we have the full response back we're going to again query Gro and then we're going to get these follow-up questions like we see here I'm going to put a repo within the description of the video where you can pull this down and play around with it or if you want to follow along and see the different steps on how they set this up the first thing that you're going going to have to do is just bun andit a new project so I'm going to be using bun you could also use mpm so you can just bun andit Dy within your terminal and then from there you're going to have to install a handful of things you can bun install Cheerio Express Lang chain and open AI once you've done that you can go ahead and make a Dov within the root of your directory and then from there we're going to be using three different API keys so we're going to get one from open AI what we're going to be using from open AI is their embeddings model so you're going to have to track down three different we're going to get one from open AI we're going to get one from groc and then we're going to be using the brave search API right now at time of recording Brave gives you 2,000 free queries per month that you can go ahead and play around with their API and with Gro right now it is within their alpha or beta and it is free to use in terms of the cost for embeddings it is very cheap from open AI so just make sure to head on over to open AI Gro and then Brave and then once you have all those plugged in we're just going to get started within our index files I'm going to have everything within here and I'm going to run through it step by step the first thing that we're going to do is we're going to import a handful of different modules like I mentioned we're going to be leveraging l L chain for a variety of different things we're going to be requiring the opening eye package and then Cheerio to do a little bit of simple parsing for our web page once we have that set up we're going to initialize an Express server you can set this to whatever Port that you like I just have mine set to 35 we're going to set up some simple Express middleware once we have that set up we're going to initialize Gro and our embeddings now the thing with Gro is it does conform to the open AI schema so you can go ahead and still Leverage The open a package they also have an SDK that you can use if you'd like but if you want to use openai you can just pass in the base URL as well as your API key for Gro so what we're going to do we're just going to set up a post request you can put this to any endpoint that you like I'm just going to set it to the base URL in this example then I'm going to console log an awful lot of things so if you saw within the example it runs through pretty quick in the successive order on how the different things are accomplished first we're just going to log out that we've received a post request once we've done that we're going to go ahead and destructure a bunch of different things the different data that you can send in to the API it's going to be the message the sources and then it's going to be optional whether you want to return those sources it's going to also be optional whether you want to return a follow-up question and then the embedded sources in llm response so if you want to have annotated responses similar to perplexity will where it will show you within the llm response the different queries that's going to be what the embedded sources in llm responses then we're going to have some simple variables that you can play around with for our rag pipeline we're going to have the text Chunk size for the number of characters that we embed we're going to have the text Chunk overlap for the overlap within those embeddings we're going to have the number of similarity results which we query from our Vector store and we're going to have the number of pages to scan so I have a bunch of defaults within here you don't need to necessarily explicitly pass in all of these at once just know that these are the default settings if you're playing around with us the first thing that we're going to do is we're going to declare a function that rephrases our input my rationale for doing this is that not always the input that you send in is going to be one that's going to be well received for a search engine API in this case we're just going to go ahead and specify that we're going to be using the Mixr model we're going to be using thex Ro model for all of these different examples and then within the system message we're going to say you are a rephraser and you always respond with a rephrase version of the input that is given to a search engine API always be succinct and use the same words as the input only return the rephrased version of the input I found that this does work pretty well if you want you can play around with this a little bit but I did find this works reasonably well for accomplishing what we want to do then within our user message we're we're going to go ahead and pass in the input string that was sent in from our request and then we're just going to log out that we've rephrased the input and got an answer from Grog so once we've done that we're going to initialize the search engine process so we're going to go ahead set up Brave we're going to pass in our message we're going to ask for the rephrased message that we had just set up once we have that we're going to go ahead and get the documents from Brave essentially all we need is the link and the title and then we're going to normalize the data and then I also see a little mistake here of hard coding the count of four which I'll update within the repo what we're doing for our normalization is we're just going to make sure that we have a title and Link and that the link doesn't include brave.com I did find for some queries it did often return brave.com so just to filter those out we're just going to return the title and Link for each of these from there we're going to go ahead and just do a simple request for each page now the thing to note with all of these different requests is I have it set up with a simple fetch request now for a lot of pages that load on the client side this isn't going to be enough so you might have to use something like Puppeteer which would inherently slow down all of this if you want to be able to access a wider array of websites just one thing to note on that and then to extract the main content we're going to parse a number of different things so we're going to remove the nav the footer script tags all of the different things that we don't really want all we really want to load up is the text content so we don't want to load up things that are within an if frame or a script tag Etc you can play around with this a little bit more and layer in some things to filter out some other data if You' like then this is going to be where we set up the vector process what we're going to do is we're going to ignore any page that has has less than 250 characters and the reason for this is say if there's an error message on the page or it doesn't like that you're trying to make a fetch request or something we're just going to return those and skip over that then what we're going to do is we're going to set up our recursive character text Splitter from Lang chain what we're going to do is we're going to be passing in the chunk size and the chunk overlap that we had specified within the request then from there we're going to go ahead and put all of those different split up texts within our memory Vector database and then within the metadata we're going to put the link and the title then we're just going to iterate through all of these to set this all up finally once that is done we're going to perform a similar search with the message that we had sent in and then we're going to pass in the argument of the number of results that we specified here is going to be where we get our sources and then also the sources parsed if we specify that we want to return those within our API and then we're going to filter out the duplicates within the link so it's only going to be showing one link per Source from there we're just going to say that the reg process is complete and we're going to be preparing the response content so for the main streaming portion to set up the response for the answer from the llm we're going to say here's my query we're going to pass in the message and then we're going to say respond back with an answer that is as long as possible if you can't find any relevant results respond with no relevant results found then if we specified that we do want to embed sources in the llm response to get that annotation we're going to conditionally return the sources used in the response with iterable numbered markdown style comments then we're going to B on stringify all of the sources and the reason that we do this is we're going to pass in both the top results from our Vector retrieval process but then also that metadata as well and the reason that we do that is within the metadata we have the link and the title and that's going to be what it hopefully uses to generate all of those different annotations if we've specified that we want them and then finally we're going to specify that we're going to have streaming set to true then for the API itself I have it set up just to respond back once everything's complete so we're going to be concatenating our response as it gets streamed in here so the way that the API set up is it's going to respond once we have that response in full you could also tweak this a little bit if you want to just stream out that response to your API so what we're going to do is as we go through all of the different chunks that we receive back we're going to make sure that there is actually a message there and that the Finish reason does not equal stop so as it's going through there we're going to write it out to the terminal and then we're going to be adding with each chunk to this response total so once we have that set up we're going to set up our response object with all of the different conditionals we're going to say if you said that you want sources returned we're going to go ahead return sources within the object for the answer itself we're always going to be returning this and then for the follow-up questions if you passed in return follow-up questions to true we're going to go ahead and generate those follow-up questions and then we're just going to be logging out the follow-up questions once we've generated that and send a response and then lastly to generate the follow-up questions what we're going to do is we're going to create another query within our system message we're going to say you are a question generator generate three follow-up questions based on the provided text return the questions in an array format and then within the content we're just going to say generate three follow-up questions based on the following text return the questions in the following form and then we're just going to be sending those back within an array and then finally we're going to be setting up our server so that's pretty much it to set all of this up but then from there you can just go ahead and query this with whatever you'd like so again I'll just query this one more time just so you see how it works we see that it's running through all of those different steps relatively quickly we get that streaming response resp back we got our follow-up questions and then we have everything within the responses here so we have the sources we have the answers and then we have the follow-up questions here so that's it for this video if you found this video useful please comment share and subscribe consider becoming a paid subscriber on YouTube or patreon to help support the channel cheers until the next one
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.