
This comprehensive video guide demonstrates how Portkey's AI Gateway can simplify LLM integrations within applications. The tutorial explores how to interact with multiple AI providers such as Mistral, Perplexity, and OpenAI using a universal API provided by the AI Gateway. The advantages discussed include caching, automatic retries, and fallbacks for error management. The video also presents how to use canary testing, creating virtual keys, load balancing, and creating configurations. Lastly, it delves into how canary testing can allow a specific weight for how many queries are sent to a different LLM and how you can create a configuration for retrying queries. Finally, the speaker demonstrates how to use the platform’s logging feature. 00:00 Introduction to Portkey's AI Gateway 00:12 Understanding the Universal API 00:37 Benefits of Caching in AI Gateway 00:37 Exploring Supported AI Providers 00:59 Cost and Speed Advantages of Caching 01:22 Fallbacks and Automatic Retries 02:03 Load Balancing Across Models 02:42 Canary Testing for New Models 03:21 Creating and Using Virtual Keys 03:57 Exploring Portkey's Platform Features 04:14 Using the Observability Platform 05:19 Creating a Configuration with the GUI 05:49 Setting Up Virtual Keys 06:13 Starting a New Project with Bun or Node.js 08:39 Logging and Caching in Action 10:26 Conclusion and Final Thoughts 🔥 Don't forget to like, share, and subscribe for more! 🔗 Relevant Links: https://github.com/Portkey-AI/gateway https://portkey.ai/features/ai-gateway 👉 Follow me on Twitter for updates: [@dev__digest]
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
--- type: transcript date: 2024-01-18 youtube_id: TpUwSmGfMrQ --- # Transcript: AI Gateway: Enhancing LLM Integrations for Application Development in this video I'm going to be showing you port Keys AI Gateway which is a way that you can simplify your llm Integrations within your applications let's say you want to interact with mistol you want to interact with perplexity and you want to interact with open AI what AI Gateway allows you to do is to have first of all a universal API now what do I mean by that that allows you to query the model in the same way all you have to do is swap out the model as well as the API key and then the response schema that you get back is going to be consistent across all of the different models so save if you were to integrate this yourself you'd have to deal and contend with multiple different schemas and multiple different vendors on how they set up their inference apis one thing that really stands out with a Gateway is the number of providers that are supported so you have open anthropic Azure open AI coher any scale Google Palm Google Gemini together AI perplexity Mistral AWS Bedrock Azure ml B LM and you have framework support for Lang chain and llama index within their python version of their libraries now the the next thing that AI Gateway allows you to do is it allows you to create a simple and semantic caching layer that caching layer is helpful for a number of different reasons it helps you save on your llm cost by not having to continually query that llm especially if it's the same query and then also the inference speed so instead of having to wait for the llm for that response especially if there's something with a larger token count that can really helps the latency within your application and another great feature that's built into a Gateway your fallback so what fallbacks allow you to do is say for whatever reason there's an error from opening a when you're quering it it will allow you to specify another model for if there is an error on an endpoint it will go ahead and try a different model so say in a scenario you want to use open eyes GPT 3.5 and if for whatever reason there's an error on that endpoint you want to go ahead and default back to something like anthropics Claud if you'd like somewhat similar to fallbacks is there is also the ability to have automatic retries what automatic retries allow you to do is to query that model multiple times if there is a failure so on certain status codes you can go ahead and and specify that you want to retry that query X number of times the other nice thing that Gateway allows you to do is to load balance so if you have a number of different models within your you can go ahead and specify you want a certain portion of your queries to go to said models and you can go ahead and specify the amount of weight that you want to have towards each model this can be useful in a number of different ways just to name a couple say if you're using a service where you're hosting your own model and you're incurring cost by actually running that GPU within the cloud on the service that you have this is one mechanism that you can use to make sure that you don't overwhelm that GPU that you're using this could be useful is say you have an allotment of free credits across different services and you want to make sure that you're trying to use up those credits before actually incurring CA this is a way that you load balance that out across a number of different models next up a really cool feature is Canary testing say if you want to try out a new model but you don't want to roll it out to all of your users you want to test it out on only a segment of your users so what Canary testing allows you to do similar to load balancing is you can specify a particular weight for how many queries are sent to a different llm so say if you want to test out something like llama 2 on any scale you can go ahead and specify that you want 5% of your traffic routed to any scale or perplexity and try out that inference API you can go ahead and Route a certain amount of traffic to a particular model so while it sounds just like load balancing largely it is it's using the same technique as load balancing but it's really just to achieve a different outcome and finally one of my favorite features is the ability to create virtual keys so pork Keys virtual Key System allows you to securely store all of your different llm keys with in one place so I'm constantly having to log into different Services depending on the videos that I'm creating and sometimes I'm limited in the number of API keys that I can use and I'm constantly having to log into all of these different guies whereas this allows me to have one platform where I can go in store and reach for all my API Keys there's also an added security benefit of this where say if you're within an organization you don't want to share an open a key directly and you want something that's put in front of that this is a way that you can have that virtual key and manage that so the other nice thing with Port key is there is an open source repository which you can check out and easily install and get started with your own server on routing everything through that or you can use their platform that has a generous free tier that you can go ahead and try it out now there's a number of nice things with their free tier not just the AI Gateway there's also a full stack llm observability platform that allows you to track different things like number of tokens used or the cost or latency request and the unique users that your application is using so a whole host of things that are built in within the platform so I encourage you to make an account and check this out to give you a quick look on the observability platform this is just a quick look on Sample data so you can go ahead and look through here and see the different things that could be interesting to you so say if you have a lot of errors on a particular model or if you want to track cost or the number of tokens that you're using that's all built within here the other nice thing with their platform is if you want to track whether there's errors or whether the cach is being hit for particular queries or how long things are taking you can go ahead and at the level of each query and what is happening on each query itself now there is also the ability to create prompts here so if you put in the different environment variables for the various services that you're using you can go ahead and use this like a playground that you would on open AI but the only difference here is you can interact with a host of different models and services all within one platform the other nice thing with the platform is there is this nice interactive guey where you can go ahead and create a config so say if you want to play around with how to set up the schema this gives you a nice area where you can go ahead and make sure that it's a valid schema for what's ultimately going to be passed as the configuration object the way that this is useful is say you just put in an invalid schema maybe you don't put it in the right place or maybe you forget to put a comma or something like this this will go ahead and start to guide you down the path of what's actually acceptable and how you can correctly query the AI gay way and then to make a virtual key like I mentioned earlier on in the video you can just go to the virtual key tab click create select the model that you'd like to use put in your API key and once it's created you can go ahead and see that within here you have your example key that you can go ahead and copy and you can use this API key that routes through the port key platform to leverage that open AI key or whatever key that you end up choosing with in here I'm going to go ahead and show you an example on how you can get started within bun or node.js so you can go ahead and start a new project you can just bun andit and then you can get that package.json all set up and then once you have that set up then you can go ahead and bun or npm install or key I and V then once that's set up you can go ahead and touch EnV and that's where we will put all our environment variables I'm going to show you an example with open Ai and perplexity all that you have to do is create a new secret key you can name it something like P key if you want and once you have that key you can go ahead and create that secret key you can give it a name make sure you specify openingi and then just paste in the key there so now that we have that we see that we have our virtual key here you can go ahead and copy that key and within your environment variable you can just go ahead and paste it here so in this case I named it open aore Virtual _ API underscore key and then you can put in the value right here then once that's all set up you'll be able to reference that virtual key within the configuration here so once you have at least one environment variable all you have to do is within their interface you can go ahead and copy your API key right from the bottom left corner here and paste it within your environment variable once you have your Port key and a virtual key you can go in within here and specify your virtual key in two ways you can pass it within the configuration of when you establish Port key or you can pass it in when you perform something like chat completion as well both will work if you pass it in within the chat completion this one is going to take precedence if I go ahead and query this just like I'd use something like the open a SDK and I Bun Run you'll see that I have a test here and it returned correctly and I did want to point out the configuration so that object that you could just copy from their guey here once you have it all set up you can go ahead and paste it in here so say if you want to have a caching layer or you want to specify the number of retries or fallbacks or Canary or all of those things that I initially showed you that's going to be where where you specify them now I just wanted to show you what this looks like to have a second example so in the first example I'm using gbt 3.5 turbo so open AI endpoint I'm specifying in my virtual key now if I want to use something like perplexities API I can go ahead specify the model put in my virtual key that I have for perplexity and you can go ahead and query that as well just looking at these two you can see how easy it is to switch from one model to another model this will save you a ton of time where you don't have to write different parsing logic to handle all of the different schem EMAs for all of the different model vendors this will do all of that heavy lifting and normalizing of the data for you where it's consistent on how you query it as well as how the data is returned now the last thing I wanted to show you is the logs within their platform which can be super handy so just to give you a glimpse for that query that I sent where I'm saying this is a test you can go ahead and open that particular log you can see how long it took how much it cost you can see the response and then the other nice thing is you can also see the caching status so if I go back here and I make another query with the same message and I head back over to the logs you'll see that it pops up right here and we see that now we have a cash hit so instead of having that same message being roted again to open Ai and incurring that cost now you can see I'm no longer incurring a cost from the llm for the tokens that were used you can also see that the request time for that response was significantly less than the original one so 126 milliseconds versus 576 milliseconds from the llm so while this is a small example let's say if I say explain quantum mechanics to me in 10 sentences and I go ahead and run this now we know just from specifying that we want 10 sentences then this is going to take a considerable amount of time longer than that initial query but now once we have that response we can go ahead and check this out we can see that it took almost 9 seconds to complete but now if I go ahead and run that one more time if we look at the request timing for the new response we can see that instead of almost 9 seconds we have 30 milliseconds which is almost 800 100x increase in latency So the faster your application obviously the more enjoyable it's going to be for users to use and it can potentially save you a ton of money on not having to incur all of those additional LM costs especially if it's an application where you don't need variability between the responses that you send back to the user and you're able to send back the same response this is something that might be particularly useful for your application where you can have more consistent responses especially for common requests there's a ton of other features within the platform that i' enourage you to make an account and go ahead and check out hopefully you found it useful if you did please like comment share and subscribe and otherwise until the next one
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.