OpenAI o1: The Next Leap in AI Reasoning - Developers Digest

Transcript

--- type: transcript date: 2024-09-12 youtube_id: n9g9Dpe0rxY --- # Transcript: OpenAI o1: The Next Leap in AI Reasoning in this video I'm going to be going over open ai's new 01 models we've developed a new series of AI models designed to spend more time thinking before they respond here is the latest update on 01 research product and other information try it in chat GPT Plus or from the API on this main page here opening.com sl01 there are a ton of different papers that you can read through blog posts as well as technical papers or their system card that they're calling it you can take a look at a few different examples if we take a look at the first one here so introducing opening eyes1 preview this is a new series of reasoning models for solving hard problems and it's available today they've developed a new series of AI models designed specifically to spend more time thinking before they respond they can reason through complex tasts and solve harder problems than previous models in science coding and math today they're releasing their first series in Chachi PT and within the API this is a preview and we expect regular updates and improvements along on side with the request we're also including evaluations for the next update currently in development essentially how it works is through complex reasoning using Chain of Thought They mentioned we train these models to spend more time thinking through the problems before they respond much like a person would through training they learn to refine their thinking process try different strategies and recognize their mistakes on challenging Benchmark tasks in physics chemistry and biology we potentially have a model or a series of models which is at the PHD level in stem they mentioned we also found that it excels in math and coding in the qualifying exam for the international mathematics Olympians gbd4 only solved 133% of the problems while the reasoning model scored 83% their coding abilities were evaluated in contest and reached the 89th pertile in code forces completion you can read about this within the technical research post so they mentioned as an early model it doesn't yet have many of the features that make chat gbt useful like browsing the web for information and uploading files and images for many common cases GPT 40 will be more capable in the near- term but for complex tasks this is a significant advancement and represents a new level of AI capability one of the lines that really stood out here given this we are resetting the counter back to one and naming this series open ai1 this might mean that this is the new paradigm that they're aligning towards the AGI goal that they have as a company there is some safety information like you would expect from a company like open AI there's also four really great videos within this which I'll link within the description of the video if you're interested there's one on quantum physics genetics economics and coding to continue through this the 01 series excels at accurately generating and debugging complex code to offer a efficient solution for developers they mentioned they're also releasing openai mini which is a faster cheaper reasoning model that is particularly effective at coding mini is 80% cheaper than preview making it a powerful costeffective model for applications that require reasoning but not a broad World Knowledge so now in terms of how to use open A1 so Chad GPD plus and team members will be able to access it today and both the 01 preview as well as 01 mini can be selected manually in the model picker at time of launch they're going to have 30 messages for the preview and 50 for the mini we are working to increase those rates to enable chat gbt to automatically choose the right model for the given prompt so that's interesting in itself where they're going to have somewhat of a router maybe for simpler problems it's going to route you to something like gp4 mini and then for the hardest problems Route you to something like 01 review now right now unfortunately developers can only access the API if you are on tier 5 at least right now and then for the rate limit for those years you're actually only going to be able to access it 20 times per minute they mentioned we're working to increase these limits the other interesting thing this API for the model doesn't include function calling streaming support for system messages or other features it's really Bare Bones it looks like you're going to be able to send in text and then get a block of text back you're not even going to have that streaming effect come out on the other end like we're used to working with their API the other great thing is they are planning to bring 01 Min access to all free chat GPT users this model will ultimately be widely accessible which is a nice thing to see they mentioned that this is an early preview for the reasoning models in chat GPT and the API in addition to model updates we expect expect to add browsing file and image uploading and other features to make them more useful to everyone so if we just think about that for a moment all of a sudden if we have this incredibly powerful model that in addition to the text question that we put in has access to the internet this alone is going to potentially be a huge leap but this is not something that's out of the box with the release now the very last line here they do actually say we plan to continue to develop and release the GPT series of models in addition to this new series of models it looks like they're going to continue on the gpt3 gbd4 gp5 type of stream in addition to this new stream that looks to be for solving much harder problems there are some other great demonstrations here so this is a good example where it gives you essentially a really good description of what you're asking for there are some examples of how this works within chat GPT you will see that it thought for X number of seconds it's going to go over a highlevel overview of the steps of thought that it had and then ultimately it's going to give you the output for what you ask this is an example where they asked for a game of snake so this is just one example use arrow keys to move the koala avoid strawberries after 3 seconds a squirrel will appear find and touch the squirrel to win press any key to start and if I just play the latter half of the video here we see you found the squirrel and then we see the koala there you were hit by a strawberry you lose and it's this simple little game but given how complex the instructions were to get this within one shot it goes without saying is very impressive now in terms of some of the specifics for the models has a context limit of 128k and a knowledge cut off of October 23 the1 preview model is going to be $15 per million tokens of input $6 per million tokens of output now for the 01 mini which is the fast cost efficient reasoning model tailored to coding Math and Science use cases again this also has 128k context and an October knowledge cut off but the pricing is significantly cheaper which is going to be $3 per million tokens of input and $12 per million token of output now let's just take a quick look at some of the evaluation metrics that they have here so they have a math competition code competition as well as PhD level science questions now for math it jumps from at 13.4 to 83 and for competition code this goes from 11 all the way up to 89 so 01 greatly improves over gp40 on challenging reasoning benchmarks one of the big Flagship metrics that is usually touted with these llm releases is the mlu and what's interesting here is they actually tucked it away but it does score a 92.3% for the mlu and if you look at the different exams as well as the mlu categories you can see how it rates across different pieces so in chemistry math law you see pretty significant improvements basically across the board across all of these different evaluation metrics so if you're interested I'm going to put all of these different links within the description of the video where you can check them out but otherwise I encourage you to check it out on Chad GPT and try it out if you have a Chad GPT Plus account I plan on making future videos with demonstrations of my experience using it and to see and try and solve potentially hard coding problems so if you're interested in that type of content subscribe to the channel I'll have more videos over the coming days but otherwise that's it for this one if you found this video useful please like comment share and subscribe otherwise until the next one

OpenAI o1: The Next Leap in AI Reasoning - Developers Digest