
🔍 OpenAI's Game-Changing API Update: 100% Reliable JSON Outputs Explained! In this video, we delve into OpenAI's latest API update introducing structured outputs. We explore how it differs from JSON mode by ensuring outputs conform to provided schemas, and discuss its implications for developers. Key topics include function calling, response format parameters, and safety measures. We also cover limitations like additional latency on first requests and potential model hallucinations. Discover how this innovation can simplify data extraction and enhance application reliability. 00:00 Introduction to OpenAI's Structured Outputs 00:11 Understanding JSON Mode vs. Structured Outputs 00:38 Frameworks and Evaluation 01:27 Accessing Structured Outputs: Function Calling 02:56 Accessing Structured Outputs: Response Format Parameter 03:39 SDK Support and Use Cases 03:48 Generating Dynamic UIs with JSON Schema 04:59 Reasoning Steps and Data Extraction 06:31 Technical Details and Limitations 08:37 Availability and Final Thoughts
--- type: transcript date: 2024-08-06 youtube_id: e48GPeq2NgA --- # Transcript: OpenAI: New 100% Reliable Structured Outputs open AI has just introduced structured outputs within their API and you might be thinking don't they already have this so they have something called Json mode which was released during their Dev Day last year but the big difference with Json mode compared to structured outputs is jsaw mode did not guarantee that it would conform to the particular schema that you provided it they mentioned within the blog post is that developers have long been working around the limitations of llm in this area bya open source tooling prompting Quest retrying requests repeatedly to ensure that the model outputs match the formats needed to interpretate within their system basically what they're talking about here are Frameworks such as Lang chain llama index instructor all of these different Frameworks aim to solve this problem they mention within the blog post is on the evaluation of a complex Json schema is that this scores with a perfect 100% whereas in comparison with GPD 40613 scores less than 40% and you can see the results here listed out within their blog post this is going to unlock a ton of use cases so things from generative UI data extraction agentic workflows there's just really a ton to explore now that we can really rely on the output that we get from the API because previously you have to do all of these different pieces on the back end to really make your application fault tolerant so in terms of how to use structured outputs there's two different ways you're going to be able to access it from function calling here's an example with their chat completion s point where you're sending in a message just like you typically would the system and user you're passing in the tools that you're using and here's just to show you an example of a relatively complicated schema they show you the output here as well so the benefit of something like this is now we have something that's a lot closer to traditional programming where we don't need to have all of these different output parsers of these different Frameworks and we can just rely on the output with some degree of confidence that we're going to get the proper structured schema back from the response one thing that I want to mention with the 100% reliability so this isn't to be confused with 100% accuracy right the model can still hallucinate so even if you specify the schema that you want there is still the possibility that particular values or things that you ask for within the schema could be hallucinate now with that being said gbd4 is the Frontier Model out there right now you definitely can with a certain degree of confidence expect that it is going to do what you're asking it to accurately but it's definitely not going to be 100% there's still going to be that possibility of hallucinations just something to keep in mind the other way that you can access this is through the response format parameter and just to give you an idea here is an example of a schema that you could provide it when you're asking for a structured output and then similarly here is the output if we just look at this back and forth we see the response format we see Json schema just like they mentioned within the blog post on how you access it that it's an object and that the properties have steps it has the type of an array if we just look at this briefly we see an array of steps and then the different objects within it another thing to not with this is they have safe structured output it will refuse if it detects a unsafe or a violation of their safety policies there is native SDK support it is available within their python SDK as well as their node and there are a couple good examples within here here's an example you are a user interface assistant your job is to help users visualize their website and app ideas you can look at how they prompted the model here we're asking for the Json schema we're calling it UI and then we're going to have a dynamically generated UI we're using that strict mode the schema is an object and then all the way through is we're going to be using enums based on HTML Elements which I think is a really creative way on how you could leverage something like this you can say okay it has a div a button a header Etc and then by doing something like this you can really easily have a generative UI app you see here all of the different outputs and then you can see all of the different uis so we have a landing page for a gardener we have a signup screen for an app we have a stock widget here and this is all just based on that initial system prompt as well as this this Json schema this can go a pretty long way cuz just to give you an idea is these three outputs are completely generated from the same schema as well as the same system message just to run through a couple more things next this one is really interesting because this is what anthropic is using within their artifacts feature to some degree another cool use case for this is actually getting the reasoning for the final response here here you can see the reasoning steps for a particular query it's walking through all of the different steps and then it's giving you that final answer and again this is all just based on the Json schema that's provided here you can see the description is the reasoning steps to the final conclusion we see that the reasoning steps is the type of an array and then if I go back to the structured output here you can see that it's an array of all of the different steps that it's essentially thinking through to get that final answer next extracting structure data from unstructured data I think this is going to be a huge use case this is pretty self-explanatory but there are a ton of use cases for taking unstructured data and making it structured data and this just makes it that much easier to give you an idea on how this could work is you could say extract action items due dates and the owners from the meeting nodes again we pass in that schema and then you're able to get the action items within this array here you can say the description the Donate as well as the owner imagine pass ing in a transcript for a call or something like that that will allow you to have something generate an output just like this so overall the devx on something like this is definitely topnotch openi has always been at the Forefront of these new developer experiences just a few more things within the blog post they touch on a little bit on how this works and they took a two-part approach on improving the reliability first they mentioned that they trained the newest model to understand these complicated schemas and how best to produce outputs that match them and the other piece with this is that given that the models are inherently non-deterministic they took a deterministic engineering approach to constrain the model's output to achieve 100% reliability essentially there's that training or fine-tuning piece or whatever it might be within the model itself but then there's also an engineering layer on top before that response is actually sent to you if you're interested in some of the more technical details they are within the blog post which I'll link within the description of the video where you can take a look but before I close out the video I wanted to touch on a few more things there are a few limitations to keep in mind when using structured outputs they mentioned that the first request with the API with a new schema is going to incur additional latency but after that first initial query it is going to be faster and there's going to be no additional latency penalty it takes a little bit more time because it's going to actually cash these artifacts for fast reuse later on and typical schemas can take under 10 seconds to process on the first request but more complex schemas may take up to a minute that's something definitely to be mindful of you're going to definitely want to be aiming for something that you're going to have robust and reliable and not have to constantly tweak when leveraging this new structured output capability so like I already mentioned the model can refuse the output or if it reaches the max tokens it can also fail they do also mention here that structured outputs do not prevent model mistakes for example ex Le a model may still make mistakes for the particular Json object it could get a step wrong it could extract data incorrectly there could be hallucinations there are all those pieces that you still have within an llm that you have to contend with the other thing with this is structured outputs aren't compatible with parallel function calls you will have to set parallel function calls to false if you're using this within function calling now in terms of availability function calling is now available on all their different models that support function calling so now in terms of availability if you're using this with function calling you can use it on gp40 gp40 Mini gp4 or GPT 3.5 turbo basically anywhere that you want if you're using structured outputs with the response formats it's available on gb40 Mini as well as a new version of GPT 40 the other great thing with this new capability is they're also dropping the cost of the input token as well as the output token for GPD 40 202 240513 let me know what you think about this within the comments below if you found this video useful please comment share and subscribe otherwise until next
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.