
This video discusses OpenAI's spring update, focusing on the announcement and demonstration of the GPT-4 Omni model, which integrates audio, vision, and text real-time reasoning capabilities. The video includes a practical demonstration involving a math tutoring session, showcasing the model's interactive and dynamic response system. Additionally, it covers the introduction of a chat GPT desktop app and a web UI update, aiming to enhance practical applications for developers and users. The presenter also tests the model's response time and shares insights on availability for free users, pricing details, and the model’s potential impacts on AI development and existing startups. 00:00 Introduction to OpenAI's Spring Update 00:17 Demonstration of GPT-4.0 Model 00:22 Interactive Math Tutoring with GPT-4.0 01:56 A Casual Conversation and Announcement Tease 02:31 Revealing the New Model's Capabilities 03:39 Exploring Chat GPT Desktop App and Web UI Update 04:30 Demonstration Insights and Accessibility 05:43 GPT-4.0 Omni Model Features and Accessibility 07:27 Pricing, Context Window, and Capabilities of GPT-4.0 09:03 Final Thoughts and Call to Action
--- type: transcript date: 2024-05-13 youtube_id: SLz_kxqBJ_Q --- # Transcript: GPT-4 Omni: Exploring OpenAI's Latest Flagship Model in this video I'm going to be going over the latest announcements from open aai and their spring update and what they released today first off is the GPD 40 model which is arguably the biggest part of the announcement this is their latest Flagship model which is able to reason across audio vision and text in real time I'm going to show you a quick demonstration of it before I get started and go further within the video here hello there I'm here with my son and I'd love you to tutor him on this math problem but don't give him the answer you can ask questions and nudge him in the right direction but I really want to make sure he understands it himself and he's here in the room so you can talk to him directly of course I'd be happy to help let's look at the problem together can you first identify which sides of the triangle are the opposite adjacent and hypotenuse relative to angle Alpha all right so I'm pretty sure this is the angle Alpha right here right corre correct now looking at the triangle which side do you think is the hypotenuse um hypot I'm not totally sure I think I think it might be this one but I really am not sure this side AC you're close actually side AC is called the adjacent side to the angle Alpha the hypotenuse is the longest side of a right triangle and is directly opposite the right angle can you find which one is the hypotenuse oh okay I see so um I think the hypotenuse is this really long side from A to B would that be correct exactly well done now that you've identified the hypotenuse can you identify which hey how's it going hey there it's going great how about you I see you're rocking an open AI hoodie nice choice what's up with that ceiling though are you in a cool industri style office or something well can you take a guess at what I might be doing based on what I'm showing you here from what I can see it looks like you're in some kind of recording or production setup with those lights tripods and possibly a mic it seems like you might be gearing up to shoot a video or maybe even a live stream yeah in fact we've got a new announcement to make that's exciting announcements are always a big deal judging by the setup it looks like it's going to be quite the professional production is this announcement related to open AI Perhaps it is and in fact what if I were to say that you're related to the announcement or that you are the announcement me the announcement is about me well color me intrigued are you about to reveal something about AI or more specifically about me as a part of open AI you've got me on the edge of my well I don't really have a seat but you get the idea what's the big news yeah we've got a new model that can interact with the world through audio vision and text as you can see here this unlocks a ton of different use cases from this new model we have one endpoint where we'll be able to pass in audio vision and text as inputs and get responses back in in different modalities so that shows the capability of that new endpoint that's going to be available for developers so there's a ton of different applications that this is going to unlock the other exciting part of this announcement is there is now going to be a chat GPT desktop app as well as a web UI update and I'm really interested in this Chad GPT desktop app because they demonstrated some capabilities where you'll be able to interact with what's on your screen you can imagine say you're coding a problem and you want to ask Chad gbt a question about it you'll be able to select that chat GPT icon and be able to say something like I'm having a problem with this particular function can you help me solve it and it would be able to respond back to you just like you saw with that audio and help you solve that function but starting to lean into more practical use cases where we're not just going to have to type out all of these different messages into an interface and get these text based responses back it's going to be a lot more Dynamic and fluid these applications that we're going to be be able to build now with that being said the other interesting part that you might not have seen in that quick demonstration that I had there is the ability to both interrupt the model but also the intonation of the responses back from the model is it seems to really that emotion that you're trying to convey whether you're upset or whether you're concerned it seems to be able to detect those various emotions within your voice the other thing that's really neat with the demo is they asked to have the TTS respond back in a way of having it have a robotic voice they said something like respond back in a robotic voice and it's able to respond back in that style and tone the other thing that's really impressive with this demo is it's able to respond back in as little as 232 milliseconds which is incredible in terms of the response time for both being able to have something that can reason as well as respond back with that TTS and the other thing that they announced with this which is really great is it's going to be available to all Chad GPT free users and what they mentioned is they want this model's capabilities to be in the hands of as many people as possible that's going to be a really big unlock where you don't need to sign up for a $20 a month account to be able to access their latest Flagship model to access GPD 40 you can go on over to the open aai platforms I do have a premium account if you don't see this quite yet hopefully you will see this as a free user over the coming weeks how it looks right now you can just select it right above GPD 4 right now and if I just go ahead and I say hello world we see it respond back so if I say a short essay on landing on the moon so the one thing you will notice is that this model is much much faster than gp4 it's going to be interesting to see whether they continue to keep gp4 in its current state within the chat GPT interface here but a couple new features within the UI that you'll see here if you want to change the model to test it out so say if you want to try the gp4 model you can see that right away that that's responding back at a much slower Pace right so if I just go ahead and refresh this here the other thing that I did test is when I uploaded a video I actually thought that the video would process and be able to understand the request if I just go ahead and I say what is in this video what it's doing off the bat is it's using the code interpreter I thought it would have been leveraging some Vision capability under the hood to be able to detect what's within that video but here it's giving me like the high level overview on the video itself from running these Python scripts that I was a little bit surprised from the announcement I sort of thought that we'd be able to pass in directly these videos but nevertheless hopefully that is something that is ultimately going to be coming down the pike here I'd encourage you play around with this new GPT 4 Omni model and also if anyone's a non-paid member of chat GPT let me know in the comments below whether you're able to access the new gp4 Omni model so a couple other things on gp4 4 in terms of the pricing and the context Windows GPD 40 is half the price of gp4 turbo on both input and output so the input is $5 per million token and then on output it is $15 per million token you're able to pass in up to 128,000 tokens of context and then it also has a knowledge cuto off of October 2023 the other things that they mentioned which I just wanted to highlight is that this model is going to be capable of a ton of different things they gave some really good explorations of the capability here you can imagine writer's blog or visual narratives so it's a huge unlock you can just think of the funnel of modalities in are going to eventually match the modalities out you can imagine all of the different use cases that this will unlock for a number of different developers but as it usually happens whenever open AI does release these new updates to their models or interfaces there are a number of AI startups that aren't necessarily happy let's say here there are similar capabilities that other AI startups are focused on within the vertical that they just released and it's going to be interesting to see how it plays out and the adoption of these types of things I'm going to be making some more videos diving into the gbd4 OR Omni model as they put it over the coming uh days and weeks I'm going to be showing you how to leverage this with in your application I'm going to be showing you some ways on how you can build with it but I just wanted to give you a really quick update on the open AI announcement and their spring update today so that's it for this one if you found this video useful please like comment share and subscribe and otherwise until the next one
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.