GPT-4o Mini: OpenAI's Multimodal Marvel UNVEILED - Developers Digest

Transcript

--- type: transcript date: 2024-07-18 youtube_id: scgCtRoIfT4 --- # Transcript: GPT-4o Mini: OpenAI's Multimodal Marvel UNVEILED all right we have a new model from openai GPD 40 mini and this is a really impressive model so it's cheap fast and it is very intelligent if we just plot it against some of the competitors that's within the ballpark of where they're positioning gb40 mini this model really competes with Gemini flash as well as Claude Hau and there's a number of reasons for that the size the speed but possibly most notably is the multimodal capability now while there are some open Source models out there there aren't really any open source multimodal models out there right now that stack up against gb40 mini Gemini flash or Claude Hau right now I'll go over the blog post in just a moment but in terms of the pricing for the model this model is 30 times cheaper than GT40 the other thing with this model is it is significantly cheaper than even GB 3.5 turbo it's 15 cents per million tokens of input 60 cents per million tokens of output when you compare that to GPD 3.5 turbo it's 50 cents per million tokens of input and $150 per million tokens of output really considerably cheaper across the board and mind you this also has the vision capability built right into the model so I wanted to pull up this chart that I saw from artificial analysis which is a really great account to follow where they Benchmark a ton of different llms across a number of different metrics as well as different providers and you can see in this chart is plotting the mlu so you can think of that as the intelligence of the model across the price of the model and it really stands in in a category of its own but you can see that it's cheaper than Gemini 1.5 flash as well as CLA 3 hiu and then it has a significant bump in that MML score like we had already talked about that's another thing to consider when you're choosing a model is whether you'll need that multimodal input like images and what have you on the multimodal note if we hop back to the blog post here what's exciting is that as of today it supports text and vision within the API but there are plans to support text image video and audio in the future so we don't know exactly when but I think when this happens this is going to be a really big shift on choosing models where all of a sudden you don't need to stitch together all of these different services like you might not need whisper you might not need a separate Vision model or a text model you can just pass in all of your input and be able to get the response that you're looking for so I'd imagine over the next 6 months or so we'll start to see more multimodo models that are full featured and don't just have those two modalities and that really have a plethora of different mod ities and mediums that you can pass in and get responses from the other thing to note with this model is it does have a token context window of up to 28,000 and the training data is up to October 2023 the other thing that I wanted to point out with this model is how it ranked within the arena where you can go to this chatbot Arena on this website here and essentially what people do is you can put in your query and you will choose your response from these two models that stream out side by side and what's interesting with this is that this model even outperformed clae 3 Opus which was the flagship model just a number of months ago this just goes to show you how competitive all of this space is right now we're starting to see all of this leap frogging between Google anthropic and open AI as well as some notable open source competitors like llama 3 as well as other models that we're seeing like Quinn and ye that have really strong performance as well this has Preferred responses across almost 7,000 votes that ranks up there with these Flagship models and the thing to keep in mind with all of this is we still have gbd 5 on the horizon so presumably that's all trained and getting ready and potentially that could be released at any time so maybe later in the summer all we don't know yet but that's a thing to consider is this isn't really even at the Forefront of what's going to be soon available to all of us but it goes without saying that this is going to be a really popular option I think for a ton of different applications just in terms of the cost and performance Al this is a model that's geared at a number of different model calls whether it's with Lang chain or Lang graph it's not just one sort of simple text in text out response there might be a graph or sort of a cognitive architecture as Harrison Chase puts it where you'll have this application that will go in and do a number of different inference calls depending on where it is within the application and what it's being asked to do this model is really geared to those types of applications though it can definitely be used in a conversation sort of History scenario where say if you're passing in that whole history and context of a conversation this is going to be a way that is really cost effective as well as getting some good results in terms of the responses back and the quality of the model overall the mlu is always the flagship metric for these models when they come out you'll always see the mlu as the first Benchmark for these models which is the general intelligence of the model and you can see that it really outperforms pretty much across the board and the wide margin is really from GPD 3.5 turbo to GPD 40 mini in my opinion cuz this is really the model that they're aiming to replace on their platform going to happen overnight but arguably you can switch to this model and your application will be considerably better and cheaper overnight you don't need to look to another provider all of that you can just change that model string and you'll be off to the races and even have new capability like being able to use images without too much extra effort I wanted to point out since I often do focus on coding on the channel is that the Cal score which is the coding Benchmark did score 87% with this model when you compare that to Gemini flash or Claude haou it is respectively 71.5% or 75.9% for Claude ha cou with that being said I wanted to test it out on how it performs on creating artifacts which was popularized in the clad interface here while I'm here I'm just going to show my quick input here I'm going to say give me the following an SVG of a spoty face a react sign up form a mermaid or chart of a tech company HTML CSS JS game of Flappy Bird and finally a react counter application let's just submit that and let's see what this model does we see it streaming out in real time here we see that it's created our SVG for us it created a signup form component it's going through all of this very quickly and let's just see how it did we have our react component of a simple counter here so you can see all of the code we have our Flappy Bird game here it's definitely got a little ways to go but it's a starting point and if we go over to the tech company org chart we see that we have that as well we see that it's able to create a signup form and we see that we're able to have a smiley face so one this really cheap model is able to do all of that right off the bat in terms of the work that I do this is definitely going to be a model that I'm going to be leveraging in some of my upcoming applications that I'm building out for everyone just go through a couple other things within the blog post they mentioned that they partnered with some companies like ramp and superhuman which both found that GPD 40 mini performed significantly better than GPD 3.5 turbo for tasks such as extracting structured data from receipts or generating highquality email responses when provided with thread history there's of course almost at this point the obligatory built-in safety measures that you can expect from openingi Google as well as anthropic whenever they release a model there's always some good information that you can read up on the safety practices that they've implemented within the model and then in terms of the avail availability and pricing it is available now from the completions API as well as the batch API and they're even going to be rolling out fine tuning for gp4 o mini in the coming days in terms of chat GPT this is going to be available on their free plus and team tiers where you'll be able to access GPT 40 mini starting today I just checked before recording this video I didn't quite see it yet if you have access let me know within the comments let me know your experience what you think of the model I'd really be interested to hear another thing I love that they highlight within the blog post is that the cost per token of gp40 mini has dropped 99% from Tex D Vinci 3 which came out just 2 years ago they talk about that they're committed to continuing this trajectory of driving cost down while enhancing model capabilities this is going to unlock a ton of new use cases I'm excited to play around with this but that's pretty much it for this video if you found this video useful please like comment share and subscribe otherwise until the next one

GPT-4o Mini: OpenAI's Multimodal Marvel UNVEILED - Developers Digest