
In this video, we delve into OpenAI's latest release, Codex, a cloud-based software engineering agent designed for various coding tasks. Unlike tools like Cursor or Windsurf, Codex integrates with GitHub, allowing natural language interactions for tasks like writing features, fixing bugs, and proposing pull requests. Powered by Codex One, derived from OpenAI's most powerful model, it is optimized for software engineering through reinforcement learning on real-world coding tasks. The video also covers practical usage, such as task assignment, environment handling, and task completion. Codex can commit changes, provide verifiable logs, and even open pull requests directly. It is initially available to ChatGPT Pro members, enterprise, and team users, with support for additional user groups coming soon. Join us as we explore the capabilities and practical applications of Codex in modern software development. 00:00 Introduction to Codex 00:15 Overview of Codex Features 00:22 How Codex Integrates with GitHub 00:41 Codex's AI Model and Training 01:08 Access and Availability 01:18 Practical Examples and Use Cases 01:32 Task Completion and Environment 02:22 Commit and Verification Process 03:21 Guiding Codex with Agent's MD 03:53 Performance Benchmarks 04:20 Final Thoughts and Accessibility 04:43 Conclusion and Viewer Engagement
--- type: transcript date: 2025-05-17 youtube_id: Kd0QGZMy_tA --- # Transcript: OpenAI Codex in ChatGPT in 5 Minutes In this video, I'm going to be going over codec. So, just today, OpenAI released a cloud-based software engineering agent. Now, one thing to note right off the bat with this is definitely not a tool that's going to replace something like Cursor or Windsurf. It is quite different. I'm going to go over the blog post and then we'll dive into some demos. Codex can do everything from writing features to answering questions about your code bases, even fixing bugs or proposing pull requests. One of the great things with the platform is the way that it works is you're going to be able to integrate a GitHub repository. You can select the branch that you want to work off of and then you can go and work with the agent in natural language. You can see which PRs are open, which are merged as well as all of the diffs and changes that are associated with each request. Codeex is powered by codeex 1, which is a version of OpenAI's 03 model. In other words, this is derived from their most powerful model that they've released to date. This model is optimized for software engineering and it was trained using reinforcement learning on real world coding tasks. The way that the model was trained was to actually work almost as if it was a software engineer to closely mirror human styles, PR preferences, adhere precisely to instructions. One thing to note with this is they are going to be rolling this out first to Chat GPT Pro members, enterprise as well as team users today with support for plus and education users coming soon. So now just go through a couple different examples. The way that this works is you can assign coding tasks by typing the prompt and clicking code. Otherwise you can just ask questions about your codebase by simply asking. How this works is every time you kick off a new task, it's going to be processed independently in a separate isolated preloaded environment with your codebase. You will have access to things like your environment variables and things like that. The one thing to know with Codeex is, as you might expect, it can read edit files. It has access to the terminal to run all of the different commands as well as all of the different test suites that you might have or tests that you might generate from codecs. One thing to note with task completion is it can take anywhere from 1 to 30 minutes. This is definitely like I mentioned not something that's going to be compared to something like cursor or windsurf where you're in the loop with the agentic IDE that you're working with. While these things can take several minutes, you can spin off multiple tasks at once. Say if you want to go and clean up different tasks within your repo, you can go ahead and ask for various things and it will spin up in its own environment as if it was a separate engineer working on the codebase. In terms of some of the practical aspects, once codeex completes the task, what will happen is it's actually going to commit the changes to the environment and then codeex is going to provide verifiable evidence of its action through the citations of the terminal log and test the output. Similar to something like agent mode within something like cursor or windsurf where you're going to be able to actually see it run those terminal commands and get that output. you're going to have that same confidence where you'll actually be able to run your test suite or be able to try and build the repo or whatever it is. The really great thing with this is you can even open up a pull request directly from codecs. That is a really nice touch cuz one of the things with a lot of this AI generated code is as they mentioned as a part of their announcement is more and more especially at these AI labs a lot of these traditional programmers are actually spending more and more time reviewing code instead of actually writing all of the code where a lot of that is increasingly done by these AI systems. Next, another great consideration that they had with this is you can actually guide your different agents within an agents MD. You can place this within the root of your repository. And these are going to be text files that are akin to a readme where you can inform codecs how to navigate your codebase. If you've used something like cursor, you might be familiar with something like cursor rules, which effectively guides the agents behavior depending on the task that you ask of it. And this is a similar concept. You can determine which commands to run for testing, how to best adhere to your project standards, so on and so forth. Now, in terms of some of the benchmarks, Codeex 1, this is state-of-the-art in terms of the SweetBench verified benchmarks. What's interesting with this is Codex 1 does perform quite well with fewer attempts, especially at the oneshot range is as it goes over the number of different attempts. It does start to converge with 03 high, but mind you, both of these models are definitely state-of-the-art. Basically, it's going to go through the different files and update all of the respective areas just like you've likely seen in some of these other tools. Finally, one thing that I do want to mention is they did mention that this is going to be accessible from the chat GPT app. That is one thing, especially as they mentioned within one of the videos within their blog post is say you're on call for something and you quickly have to check something within a repo or maybe you want to paste in a trace stack within codeex, it will go ahead and start to try and iron out what the potential issue is. Otherwise, that's pretty much it for this video. I just wanted to do a really quick one going over codecs. I'm really curious everyone's thoughts, especially with the direction that all of these different coding agents are going. Let me know what you think in the comments below. But if you found this video useful, please comment, share, and subscribe. Otherwise, until the next
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.