
Star the Repo! 🌟 https://github.com/mendableai/open-researcher In this video, I explore an exciting new feature called Interleaved Thinking, recently released by Anthropic alongside Claude 4 and Claude Code. This advanced capability allows AI agents to think between tool calls, making them more adaptive and intelligent in their problem-solving processes. Learn how this beta feature changes the landscape of AI development and see a practical demonstration using an open-source Next.js template. We also cover how to set up this feature with simple steps and integrate it into your own projects. Don't miss out on discovering the future of AI agents! Links: https://www.firecrawl.dev/ https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking#interleaved-thinking 00:00 Introduction to Interleave Thinking 00:17 How Interleave Thinking Works 00:46 Traditional Agents vs. Interleave Thinking 01:15 Open Sourcing the Implementation 01:52 Building an AI Agent with Interleave Thinking 02:35 Detailed Walkthrough of Interleave Thinking 07:20 Getting Started with the Example 08:06 Conclusion and Future Prospects
--- type: transcript date: 2025-07-02 youtube_id: e-Yx3CBE22A --- # Transcript: The AI Feature Nobody's Talking About (But Should Be) In this video, I am super excited to open source open researcher. So, this is a project that I'm working on and why I'm particularly excited about this is it leverages something called interle thinking which is a beta feature that was released from anthropic. It came out at the same time that claude 4 was announced and as a part of that announcement they also announced cloud code which caught a ton of attention rightfully so. But this is one that sort of fell under the radar in my opinion. What I wanted to do is I wanted to build out a nex.js JS template that is completely open source equipped with everything that you need to get started with innerleaf thinking. And the way that this works is I've equipped the application with the ability to search the web something similar to a Google search but also to scrape the page and in this case we're actually getting the screenshot of the page to have this rendering view on the screen here. And what's really neat with inner thinking is it allows you to create AI agents without having to worry about setting up the backend orchestration process. And one thing that I do want to say is this doesn't necessarily replace the need to actually set up orchestration on the back end of your agent architecture, but this is absolutely another tool within the toolkit. Now, if I just break down how this works, we have find the second sentence of the first and third blog post on firecrawl.dev. Then compare the pricing for open eyes03 as well as claude sonnet 4. The first thing that we have here is we have the thinking that's returned from the model. we have the user is asking me about and it breaks down everything that we have within that query and then it decides okay let's start by searching for the blog post on firecrawl.dev dev. From there, since we've equipped it with the web search functionality, we'll send in the query to search the fire crawl blog. From there, we get the proverbial 10 blue links just like you would Google something. And then once we have these links within the context, in this case, it determined I need to go to the main blog page to get all of the blog posts in order. So, it makes sense. But let's say the search results didn't yield anything that was particularly useful. the agent can go ahead and determine maybe I just need to refine that search query and try again before I actually go through and do a deep scrape of the page. It will iterate through this process. And what's really neat with this is you can send in complex queries on the back end. You don't need to route or classify the different queries in terms of where they're routed or what the agent orchestration is. it can just handle all of that by thinking through the process and effectively building on the context of what it's found until it ultimately returns the result for you. Now, if you're interested in trying out what I just demonstrated for you, I'm going to open source this. It's going to have an MIT license, so you're going to be able to leverage this, put it in an application that you have, build a product out of it, do whatever you'd like with it. And just to touch on some of the fundamentals in terms of how this works where I thought it would be helpful just so you know how this actually works is to actually just show you some of the documentation that's within claude. Within the claude documentation we have a tab for extended thinking here. If I go down and go to the innerle thinking with tools what we'll see within here is the documentation that relates to exactly what I just showed you. So within this extended thinking with tools in cloud 4 models support interle thinking which enables claude to think between tool calls and make more sophisticated reasoning after receiving tool call results. Just like you saw there as soon as it thinks through what it needs to do it determines when to search, when to scrape and based on those results it can determine do I need to search more? Do I need to scrape more? Ultimately do I have the result of what the user is asking for. So like they described here with interleaf thinking claude can reason about the results before deciding what to do next. You can chain multiple tool calls with reasoning steps in between. And with interleaf thinking it makes more nuanced decision based on intermediate results. So this isn't set up in a way where we're just going to search arbitrarily a number of times or a number of different strategies. It's going to determine on the fly and adapt to whatever it finds or doesn't find within our query and our research process. One thing to know with Interleaf thinking is this is still in beta. So you will have to pass in the beta header for Interleaf thinking and some of the important considerations with Interleaf thinking. You do have the ability to set the budget tokens. And one thing to know with this given that it is within beta is if you do try this within Amazon Bedrock or Vert.Ex AI your request will fail. So you will have to leverage this from the anthropic API directly. Now, just to show you how Interleaf thinking works within the documentation, they have a great example where they're defining a calculator tool as well as a database tool. And the way that we set this up is very familiar to what we would typically do within a chatbased application. We're going to have our messages array. And within here, we do have some optional settings where we can actually define how long we want the model to think in between different steps. We're going to wait for all of the results to come back. We're going to loop through the response and based on the type of response, we're going to push whether it's thinking or whether it's a tool use response to the respective array for each of them. Then in terms of the pattern for how this works is all that we're going to be doing is we're going to be adding to this messages array. We're not going to be touching any of the parameters above. We're just going to be managing the messages context. And the way that this works to actually inform the model about the tool call results, we can pass in the relevant responses that we got from the previous message as a part of the assistance message. And then what we can do from there is we can reference the thinking block which is going to be the thought process that was returned from the previous response as well as the tool use block as well as those results that were returned from the actual function invocation. You can think of this messages array as essentially the context that you're engineering to ultimately see if you satisfy the original query about what you asked. And that's pretty much it. You can set up the backend to autonomously continually go through the query until it actually satisfies the result. And then finally, once the thinking portion is deemed to be complete, we can go ahead and specify to actually return that block of text and stream it to the front end of the application. Now in terms of getting started with the example that I showed you. So what you can do is you can either fork the repo. Also definitely love a start if you like these types of projects. And once you've cloned down the repository you can go ahead and mpm install everything or pnpm or bun whichever you prefer. All that we need to do to get started with this is we can make an API key on anthropic. I'll go ahead and create that. And then from there all that we need to do is create av. Once we've done that we can go anthropic API key and we can paste in our key. Additionally, what we're going to get is our API key from Firecrawl here. So, I'll go ahead and I'll grab our API key and I'll plug it in here. Once we have that, we can go ahead and npm rundev. We can open up our development server. From there, I can go ahead and send in a query. We'll see it animate through and it will begin to start the research process. Overall, that's pretty much it for this video. I was super excited for the team at Anthropic to release this. I really hope we see this from other API providers, whether it's Gemini or OpenAI. It's a really neat feature being able to reason through and have tool calls within the thinking process. But otherwise, if you found this video useful, please comment, share, and subscribe. Otherwise, until the next
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.