
Learn The Fundamentals Of Becoming An AI Engineer On Scrimba; https://v2.scrimba.com/the-ai-engineer-path-c02v?via=developersdigest OpenAI DevDay: 4 Major API Updates Explained Explore the latest announcements from OpenAI's DevDay! This video dives into four significant updates: Real-time API for natural speech conversations, Vision Fine-Tuning for enhanced visual capabilities, Prompt Caching for cost efficiency, and Model Distillation for creating efficient models with larger outputs. Learn how these innovations can revolutionize your applications and improve performance. Links https://openai.com/devday/ Introducing the Realtime API: https://openai.com/index/introducing-the-realtime-api/ Introducing vision to the fine-tuning API: https://openai.com/index/introducing-vision-to-the-fine-tuning-api/ Prompt Caching in the API: https://openai.com/index/api-prompt-caching/ Model Distillation in the API: https://openai.com/index/api-model-distillation/ Pricing: https://openai.com/api/pricing/ Repo: https://github.com/openai/openai-realtime-console 00:00 Introduction to OpenAI's Dev Day Updates 00:09 Exploring the Real-Time API 01:19 Function Calling Capabilities 02:06 Pricing and Availability 03:01 Fine-Tuning the Image API 03:28 Understanding Prompt Caching 04:11 Model Distillation Explained 04:56 Open Source Repository and Conclusion
--- type: transcript date: 2024-10-01 youtube_id: wzRYBfntk6M --- # Transcript: OpenAI DevDay in 5 Minutes: 4 Major API Updates opening eyes Deb day just happened and it comes with four big updates so the real-time API Vision fine tuning prompt caching as well as distillation first let's dive into the real-time API so this is going to be how developers can leverage and build applications that are similar to chat gpt's advanced voice mode which you've probably seen demos of they have this new real-time API support for natural speech to speech conversations but they also are introducing the ability to have audio input and output from the chat completions API one of the nice things with the chat completions model is you can pass in text audio or both into the input the way that the real-time API works is through a persistent web Hook connection one of the issues with the previous approach of if you wanted to pass in speech to a model is you'd first have to pass it into something like whisper and then you'd have to pass in the text that you got back from whisper into a model for inference what happens with this approach as they describe is it results in loss of emotion emphasis access in addition to that also the latency the way that this works for building it out within an application is it's through a persistent websocket connection essentially as you're speaking into your microphone it's going to be streaming that response to the open AI endpoint and vice versa it's going to be streaming those responses back that you can consume within your application now something that I'm personally excited about is that this API supports function calling just think about that you can speak to the model and if all of a sudden you wanted to invoke different actions whether it's within in a web app or within a mobile app or what have you now if it detects that a function needs to be invoked it could change the UI in your application right or do something to that effect function calling is going to be what gives it the ability to really create killer applications on this and go beyond just making like a wrapper on the voice too interaction that you'd have you're going to be able to create interesting things and imagine interesting ideas now in terms of availability and if you're going to be using the websocket route of the realtime API there is the 40 realtime pre endpoint alternatively if you're going to be using the chat completions API you can use the GPT 40 audio preview in terms of pricing for the real-time preview for the text it's broken out just like this $5 per million tokens of input $20 per million tokens of output but the big numbers here are going to be the $100 per million tokens of input and $200 per million tokens of output now that sounds like a lot and it definitely is but this is brand new and I'd imagine this is really going to drive lower over the coming year just to put this into perspective it's about 6 cents per minute of input and then 24 cents per minute of output just imagine having a conversation back and forth with the model and you'll probably be able to get a pretty good sense on how much it would cost now with this being said we have seen over the past years with open aai and other model providers that as soon as the models come out they are more expensive but in the months and years after the model comes out we do see that those prices dramatically Trend lower I would just be mindful of that when you're considering building an application with us so another great thing is now now you can fine-tune the image API I think we're definitely going to see a lot more use cases in terms of Agents within the browser or agents on your laptop or agents even on your mobile device by now being able to pass in these images and fine-tune a model based on your specific use case next up is prompt caching this is something that we saw come out originally from Google with their Gemini flash model I believe and then more recently we also saw this with the Claud series of models as well and what this allows you to do is if you're passing in the same context repeatedly to whatever your application is doing what you can do now is you can cach that context instead of continually having to pass that in to the API and incurring that increased cost for it you can see the pricing table here we're essentially across the board and even on the new models you're able to use this and the prompt caching is going to be priced at half the price of both the inputs and outputs we'll put all of the links within the description of the video if you're interested in diving into any of this further the next thing is model distillation within the API you can fine-tune a cost-efficient model with the outputs of a larger model all within their platform what this allows you to do is you can take a model like 01 preview and GPT 40 and you can fine-tune and improve the performance of models like GPT 40 mini and you can do this all within their platform model distillation as they call it involves fine-tuning smaller models that are cost-efficient using the outputs of more capable models say if you have a particular use case but you can't use the 01 series of models because it's cost prohibitive or maybe it's not fast enough depending on the use case and you can use that within their smaller model there are also a couple examples within the playground here which you can check out all right so the last thing that I wanted to show you is they also put out an open source repository for how to use the real-time API within this example you can see it's streaming to the server streaming to the client and it also has some examples on how the function calling capability Works within this I'll also put the link to this within the description of the video if you're interested otherwise that's it for this video if you found this video useful please like comment share and subscribe otherwise until the next one
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.