
Getting Started with OpenAI's New TypeScript Agents SDK: A Comprehensive Guide OpenAI has recently unveiled their Agents SDK within TypeScript, and this video provides a detailed walkthrough on how to get started. The video covers the basics, from installing the SDK to creating simple agents and equipping them with tools. Examples include creating a basic agent, using a weather function call, integrating the FireCrawl API for web searches, leveraging structured outputs, and even setting up teams of agents to coordinate tasks. The video also demonstrates advanced features like streaming responses and incorporating a human-in-the-loop for approval processes. This comprehensive guide is perfect for those looking to understand and utilize the fundamental building blocks of OpenAI's new framework. Docs: https://openai.github.io/openai-agents-js/ Repo of examples coming shortly! 00:00 Introduction to OpenAI's TypeScript SDK 00:20 Setting Up Your Environment 00:49 Creating a Basic Agent 01:56 Adding Tools to Your Agent 05:11 Implementing Structured Outputs 08:06 Combining Tools and Structured Outputs 09:53 Coordinating Multiple Agents 12:30 Streaming Responses 13:11 Human-in-the-Loop Example 15:46 Conclusion and Future Directions
--- type: transcript date: 2025-06-07 youtube_id: jJgxcIbEWVY --- # Transcript: OpenAI's New TypeScript Agents SDK OpenAI has just released their agents SDK within Typescript. So in this video, I wanted to go through a number of examples on what the SDK looks like as well as how you can get started with a number of the fundamental building blocks. First thing to get started, you do have to make sure you have Node.js installed. And once that's installed, you can go ahead and install a Agents just like it's shown on the screen here. And once that's all installed, I'm going to go through a number of different examples just to show you what the SDK looks like, as well as how you can get started with a number of different use case examples. Right off the bat, we're going to import two different pieces from the agent SDK. We're going to import agent as well as run. Agent is how we set up the agent, which we'll see in just a moment. And run, as you might expect, that's how you actually run and invoke the agent. In terms of creating a simple agent, this is the simplest form of an agent that you can create. Effectively, all that we need to do is define the variable for the new agent. And we're going to extend the agent class. And within here, we're going to give our agent a name as well as some instructions. And that's all that we need to have an agent. Now, mind you, this agent isn't going to have any tools or anything like that. I'm going to go through that in some next examples. And then all that we need to do to get the results from this agent is we can go ahead and await the run method pass in the agent that we defined as well as whatever the prompt is that we want to send to the agent. And so now if I run this script we can see that we have the agents response console logged here and we have the response that the capital of France is Paris. If I was to change this out and I say something like the capital of Denmark for instance and I go ahead and run that again. There we go. We have that answer that we're getting back from the large language model. Now that we have the simplest version of defining an agent within the framework, now let's actually go and define a tool to actually do something with the agent. Within this, we're going to import agent run and then we're also going to be importing tool to actually be able to define the different custom tools that we want to set up. First thing that we can do is we can define a tool. Effectively, there's two parts. There's going to be how we define it and map it within natural language as well as with the ZOD schema. For instance, what we have to do is essentially tell the large language model, hey, this is a tool that gets the weather for a city. We can't just define the function itself because the description is the key part of what the LLM latches on to to actually understand what the tool does. And then the way that ZOD is leveraged for the language model is this is going to be how we define different parameters or arguments that the functions take. So in this case, it's a relatively simple structure in terms of the parameters that we're setting in. We're going to be passing in an object which has a key of city as well as a string of what the city's name would be. And then from here, directly below where we define what the function actually does, this is going to be where we call that function. When the LLM determines that there is a tool called to invoke, this is going to be the actual logic that gets invoked. Within this function, we have the city name which is going to be defined from the LLM. The LLM is going to go and decide, okay, they're asking for this city, and that's going to be what's passed in to this argument. And within here, what we're going to do is we just have some placeholder data. But if it's New York, London, or Tokyo, we're going to have these set numbers. In this case, we can go ahead and test it just to make sure that it works. Otherwise, if it isn't one of those cities, we're just going to say that it's 68 degrees and sunny. From there, all that we need to do is similar to the previous agent is we can go ahead and create our agent. And in this case, the one difference is instead of just the name and instructions is now we have this array of tools. If we were to define another tool, we could go ahead and add a comma here and simply pass in the tool right within here. From there, we can go ahead and define our asynchronous main function. And the reason that we're doing this is we do have to always await those results. And a similar thing here is we're going to be passing in the agent that we defined as well as the prompt. And then we're just going to log out those results just like that. Within here, I'm going to be passing it in without any arguments. Now, if I go and I run our script here, I'll go ahead and I'll call it with tools. And we see what's the weather in New York. And we can see that New York is 72. And if I asked it about London, for instance, and I just changed that out, I can go ahead and run that. And we see the weather in London is currently 61° and sunny. and it matches just like we have as it's defined within the function. So, this is going to be if you haven't used or you're not familiar with tool calling, this is going to be where you put all of that different logic, whether it's API calls or whatever you want to do within this here. Next up, we're going to have another similar example, but instead of just leveraging dummy data for our weather function call, in this case, we're going to be leveraging the firecrawl API. What we're going to do within this is we can go ahead and install [Music] mendable/firecrol-js and I'll get that library. And what this allows us to do is to leverage a number of different methods on how we can extract targeted data from different web pages as well as search results. What we're going to do within here is we're going to define our main function. Again, our asynchronous function since we're going to be awaiting whatever the agent does. We're going to go ahead and initialize our firecrawl API client. Now the one thing to note is you do have to add in your environment variables within av. You can go ahead and you can touch to create that file. And then once you're within that file you can go ahead and put in your own AAI API key. Just openAI API or key. Get your API key from OpenAI and a similar thing for Firecrawl. It's going to be firecrawl API key. Paste in your key there and then you should be good to go. One quick aside that I just want to point out in case anyone runs into any issues. I'm running Node version 22. The way that the newer versions of Node work is for versions, I believe 20 and higher, you don't actually need something like AENV to be able to access files. That could be something if you do run into any issues and you have a version of node 18 for instance. Just make sure that you either bump it up or import something like to be able to access your environment variables. From there, we can define a web search tool. Within this, it's a similar process to what we had for the weather tool, but in this case, we're going to say this searches the web for information. It takes in a query of what to search for. And within this, what we can do is we're going to be leveraging the firecrawl search method. And what the search method returns is we're going to be able to get things like the title, the URL, the description of the page, but also we can even pass in a parameter if we wanted to get the scraped contents of the page as well. Similar thing here, we can go ahead and create the agent and equip it with the tool that we just defined. As soon as we equip that query, as soon as we invoke the agent, if there is a query that's related to searching the web for information, we're going to go ahead extract the query of whatever it was within that prompt. And then we're going to go ahead and pass that to get the results. And then we're going to be returning in this case the metadata of all of those different results. From here, I'm going to go and I can run the agent. We're going to get the output. We'll go 103 wire crawl search. And so our query was search for the latest news for AI advancements. And in this example, you can begin to see how you can leverage these different tools and equip your different agents with things that are actually useful to the application that you might be building. Next up is structured outputs. If you haven't used structured outputs before, they're super helpful because what you can do is now you can take natural language, whether it's a piece of text or a response that you're expecting back from an LLM and you can ask for that in a deterministic format. For instance, if say you have a schema that your application maps to, say you have a weather UI element where you have the type of weather that it is, whether it's like raining or sunny, you have the temperature, and you have all of those pieces mapped out within your UA with variables. What structured outputs allow you to do is you could send in something like a report or an article and you could say, "Okay, based on this article, I want you to extract these key pieces of information." You could say, "I want the title, I want the date, as well as a summary of the contents." And what structured outputs will do is it will give you that schema that you can go ahead and map to whatever the fields are within your application. Within here, we're asking for things like the product name, the category, the price, the features, pros, cons, ratings, as well as recommendations. And then from there, we can define our agent. So, in this case, what we can do instead of passing in the tools, and mind you, you can mix and match all of the different pieces that I'm showing you. Say if you wanted the fire crawl tool as well as the weather tool, you could pass those in with the tools as well as structured outputs if you'd like, if it's appropriate. Now, in this agent, we have the name as well as the instructions like we've gone through. Now, in this case, we don't have any tools, but you could equip it in combination with structured outputs if you'd like. And then from there, all that we need to do is pass in that schema that we defined for the output type. Effectively, what that's saying is we're just saying, okay, here are all of the different fields. This is what I want to have output. And the one thing to know with structured outputs is you are going to always have that schema. It is going to map to that. It's a little bit different than JSON mode if you have used that before. But the one thing with it, it can still hallucinate the values of the field, but at least you're going to be able to rely on whatever that schema is that you defined. From there, we can go ahead and define our asynchronous method. We can await our agent. And in this case, we'll say analyze the iPhone 15 Pro and provide a detailed breakdown. From there, we can go ahead and print out the response in the structured format. I'll just go ahead and run this example on structured outputs and then here we go. So this is going to be information that is based on the context that is within the model. But say if it was something about recent events or something that isn't within the training data. This is where tools can be useful. Say if you wanted a web search functionality, you could go and leverage a web search tool and be able to search for that information before you ultimately return this structured schema. And where this is helpful, as you can imagine, is if you have an application or a UI, you can go and map all of these different key value pairs to whatever you have within your UI. That's where this is extremely powerful. Next up, I'm going to show you how you can leverage structured outputs in combination with tools. In this, what we're going to do is we're going to import a number of the different packages that we've leveraged already. Within this, what we're going to do is we're going to define an output schema where we have the topic, key findings, sources, trends, as well as the recommendations as well as when it was last updated. Within this example, I'll show you how you could potentially leverage this for a research use case where if you do want to extract certain pieces based on a search, say for instance, for the pages that you search, you just want to know about the key findings as well as the sources, trends, recommendations, and so on and so forth. You can go ahead and leverage a structure like this. Again, we're going to set up our asynchronous method. We're going to define and initialize the firecrawl client since we're going to be leveraging it for our search tool. From there, we're going to have the query that we pass into the argument, which is going to be what we pass to the firecrawl search API. How this is going to work is we're going to get the five top results, and then we're going to get the markdown contents of whatever is within each of those respective pages. From there, we're just going to convert everything within a string that we get back from those search results. Now, within our agent, we have the name instructions. We have the tools like we have went through. And now, we also have in combination with the tools, we have the output schema that we defined at the beginning of this example. Now, within here, if I just go ahead and I run our agent and I log this out, here is our research analysis. And again, you can really define whatever schema you want to have for what you want to extract. We have the topics, the recent developments in large language models. We have all of the key findings. We have the sources that it leveraged for all of this. We also have the trends that it determined. Basically, all of the different pieces that we defined within our schema. Now, I'm going to show you how you can have teams or swarms of different agents. What we can do is we can define a number of different agents, but then we can also have a coordinator to actually have all of those agents follow the particular instructions of whatever we want to do within this. We're going to go ahead and define some of the similar different features that we've done in the previous examples. What we're going to do here is we're going to have a simple search tool within this example for the specialist agents. This is going to be where you can begin to almost think about agents as if they were employees within an organization or on a team in a developer context. You could have a front-end agent, a back-end agent, a database specialist, a QA engineer potentially. However you want to structure out your agentic dev team. That's the sort of idea you could leverage with something like this. Within here, we're just going to have a data collector and then we're going to have an analyst. the data collector that's going to have access to our search tool and the handoff is going to be how we determine when we actually pass that to the other agent. So when within the specialist agents we can also determine the handoff description. This is going to determine similar to a function call how we tell that coordinator agent okay this is when you leverage this particular agent when you need to collect data from web searches go and send that request to this agent. If you need to analyze the data, go ahead and send it to this particular agent. And then how this all ties together is we can set up a coordinator agent. And the coordinator agent is similar to tool calls is we can pass in the agents that we defined into the coordinator and then we can give it some specific instructions. coordinate research projects, understand what the user wants, hand off to the data collector to gather informations, and then hand off to the analysts to analyze the findings, and finally provide a summary. For the research coordinator, we have very clear linear instructions on the different agents as well as steps that need to occur at each of the different phases. We know when to hand off to the data collector after understanding the user request. And then once we have that context, we can pass that off to the data analyst to obviously analyze the data. Now if we go ahead and run this example here and similar thing, if I just make this a little bit bigger, we can see here is our analysis. This is the final synthesized report where we have the accelerated growth, the market share, the leading markets, so on and so forth. This is a really powerful and clean way on how you can leverage different agents in combination with one another to ultimately get a final report for whatever you want to build. Next up, I'm going to quickly show you a streaming example. Of course, as you might expect, the agents do have the ability to stream out the responses. Now, the one thing that you do have to determine is when you do call the run method, we're going to have the agent that we define just like we had previously. we're going to have the prompt, but then as the third argument, we're going to have the object where we specify to actually stream out that response. From there, we can loop through all of the different chunks that we get back from the stream. In this case, we'll just go ahead and write it out to the console. But generally speaking, as you might expect, if you are leveraging streaming, it probably is within the context of streaming within the UI of an application. But here, we can see all of that response streaming out. So, we don't have that latency just waiting for that big block of text at the end there. All right. So, next, this is maybe one of my favorite examples of the SDK. And this is going to be a human in the loop example. Where human in the loop is useful is let's just say you have an agent or a workflow and what it does is every day it writes a blog post for you or something. But let's say you don't actually want to have it published until you review it, but you still want to have that agentic process occur. What you can do is you can set your agent to say, "Okay, once this is done, interrupt and have a human intervene and determine whether you want to approve something or disapprove something." So this way, it basically almost sets up a draft of say a blog post, but it doesn't actually push it live until you give it the explicit. Okay, the one new piece that we're going to be leveraging within this is run state. And then we're going to be leveraging read line since we're just demonstrating this within the terminal here. First, what we're going to do is we're going to define a tool. We're going to have a publishing tool. Now, the key piece with this is similar to how we defined our other tools. We have the name, the description, the parameters, as well as the function itself. The one new piece is going to be the needs approval flag. We can just set needs approval to true. And that's going to allow us to unlock that human in the loop capability. That's all that we need to do. In this case, we're going to have the content publisher agent. We're going to equip it with a tool that publishes content to our website. And the one thing with this approval is we're not actually going to invoke this method until we have explicitly said that this is approved. If we just go through and we run this here, we're going to go ahead and create a reline interface so we can interact with the terminal for our application. Within this, we're going to pass in the argument of publish a blog titled introduction to AI agents. We're going to pass in our agent. The key piece of how this works is on the results object, we have this interruptions key. And with that key, we're going to have the state where we can set it to approved or rejected. How this is going to work is once we've met one of those conditions, we're going to go and call the agent with the result state once we have one of these answers. I'll just go ahead and run this. So the human in the loop example, I'll do human approval. And within here, we have our blog post here that was generated. and do I want to approve this blog post? If I go ahead and I say yes, we can see that it's been approved and we have the final results. The blog post with the title has been published. That's going to be where it actually invokes this method that we define. So that's pretty much it. I just wanted to do an intro to the SDK. The one thing that I do want to note with this is they do also have the real time as well as voice capabilities within the TypeScript SDK. If you do have voice applications, I'm not going to be demonstrating that within this video. I'll probably do some more examples with the real-time API in some future examples. But in this, I just really wanted to focus on the fundamentals as well as the abstractions that they built out with this new agents SDK that they released. If you found this video useful, please comment, share, and subscribe. Otherwise, until the next
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.