
Learn more about LLMWhisperer - https://unstract.com/llmwhisperer/ Try LLMWhisperer for FREE - https://pg.llmwhisperer.unstract.com/ Try Unstract for FREE - https://unstract.com/start-for-free/ Unstract Open-source - https://github.com/Zipstack/unstract LLMWhisperer is a text extractor by Unstract, a no-code LLM Platform to launch APIs and ETL pipelines to structure unstructured documents. Watch our previous video on Unstract: AI Document Parser: Extract data from complex PDFs at scale! (Open Source): https://www.youtube.com/watch?v=Ymq8o7FSoVc&t In this video, I'll introduce you to LLMWhisperer, a developer-friendly tool that transforms how your application can process and extract information from complex documents. 🍀 LLMWhisperer bridges the gap between messy real-world documents and the structured inputs that large language models (LLMs) need. It's designed to handle forms, complex layouts, and handwritten content. One of the best parts? You can try it out with up to 100 pages per day for free — no credit card needed! I'll walk you through setting up an account, exploring the dashboard, and using the playground to upload documents. 📋 We'll cover powerful use cases, such as handling handwritten forms, receipts, and even off-orientation images like driver's licenses. This tool can preprocess documents, making it more efficient and cost-effective to feed into large language models. I'll also demonstrate how easy it is to integrate LLMWhisperer into a simple app using a Next.js boilerplate. Don't miss out on this revolutionary tool! 🌟 If you found this video helpful, please comment, share, and subscribe! Timestamps: 00:00 Introduction to LLMWhisperer 00:42 Getting Started with LLMWhisperer 01:39 Exploring the Playground and API Keys 01:55 Use Cases and Applications 06:14 Setting Up a Boilerplate Application 07:25 Implementing LLMWhisperer in Your App 09:28 Testing and Demonstration 10:26 Conclusion and Final Thoughts
Weekly deep dives on AI agents, coding tools, and building with LLMs - delivered to your inbox.
Free forever. No spam.
Subscribe FreeNew tutorials, open-source projects, and deep dives on coding agents - delivered weekly.
--- type: transcript date: 2025-04-29 youtube_id: mvLFi8ea04s --- # Transcript: Get complex documents LLM-ready with LLMWhisperer (100 pages free/day) In this video, I'll introduce you to LLM Whisperer, a developer friendly tool that transforms how your application can process and extract information from complex documents. LM Whisperer bridges the gap between messy real world documents and the structured inputs that LLM need to perform at their best. LMS are powerful, but their output is only as good as the input that you provide. Whether you're working with data from forms, complex layout, or even handwritten content, with this tool, it offers powerful capabilities that make document processing even more effective. One of the best parts about this is you're going to be able to process and try this out with up to a 100 pages per day, completely for free. You don't need to use a credit card to try any of this out. To get started, you can try the forever free plan, and we can create an account. Once you've made an account, this is what the dashboard looks like. We have a playground, which I'll show you in just a moment. We also have where we can get our API keys and also where we can get the route to make the request to this endpoint. Now, additionally within here, if you're curious about the different types of data that all of the different plans include, you can go and check out all of the different formats within here. We have things like PDFs. We also have a Microsoft Office documents or Libra Office documents. Basically, most of the types of documents that you could imagine all the way up through different image types. For instance, if you even have documents that you've taken with a smartphone camera or what have you, that's a JPEG or PNG or whatever it might be, there is an option where you're going to be able to get basically whatever type of document that you have within here. You can check out all of the different nuance differences between the different tiers. And finally, we do have our dashboard where we'll have things like the usage history, our logs that will be within here as well. Now, if I head on over to the playground, you can upload your documents directly within here. Let's say you have a handful of documents that you want to test out. You can go and upload them. Alternatively, if you just want to use their free tier and process up to a 100 pages of documents, you can do all of that as well. One of the powerful use cases with this is how this can be leveraged for things like forms with handwritten data. If we look at the form here, we can see that it's relatively complicated with the structure of the document. But one of the tough things with this type of document, especially when we're trying to feed in that context into a large language model or just store it within a database in a structured format is within here we have a bunch of handwriting. The really cool thing with this is if we look side by side at this form, what you'll notice is more or less continuity in terms of the structure from left to right here. For instance, if I look at the name, first, middle, last, and suffix, we can see just like within the form, we have name, first, middle, last, and suffix. Additionally, within here, we also have checkboxes. Within here, we can see I'm applying for individual credit. And within here, we have that checkbox for I'm applying for individual credit. You can go ahead and try this out with your own handwritten documents and see how it will structure similar things. This can be helpful in a number of different contexts. So, a potential use case for how you could leverage LLM Whisperer is if you're building out some sort of AI chat application. So depending on the application, a use case where you could potentially save on the inference cost rather than sending these types of documents to expensive model like something like Sonnet or one of the O series of models from OpenAI. What you could do as an option to pre-process it with something like LLM Whisperer beforehand and then potentially even store this information. And the benefit of that is you're going to be able to have that representation of that document structured within text in a way that the LLM is going to understand. And by taking this approach, we can effectively take that document and have it ready for when that user ultimately sends in the query. While they're typing out the query, once they've attached an image, we can go ahead and pre-process it and get all of this context ready to be passed into the LLM. And not to mention the cost. While you could put a document like this within a model like 03 or 04 mini or what have you, there definitely would be an increased cost associated with passing in an image into the large language model. Another great use case that I personally love about this is being different receipts. For instance, for anyone that's self-employed, one thing that I've found is I often do take pictures of a lot of the different receipts that I have. Actually going through and processing all of those different documents is quite timeconuming. But what you can potentially do with something like LM Whisper is just take photos of all of those different receipts, upload those and have those within the text representation of what that document is. And that will ultimately make things like filing taxes or if you're in a department where you have to handle a bunch of employees receipts that might come through in email just like this. By being able to have a pipeline or a system with LM Whisper, you're going to be able to process documents just like this. Just to show you some of the flexibility of this is here's a document of an invoice. Within here we even have a representation of a calendar. We can see the new balance, the minimum payment due, so on and so forth. And then within here we also have exactly that handwritten note that's directly on the document. One more document that I wanted to show you is an off orientation image of a driver's license here. What's great with this is you don't have to worry about something that's rotated in a proper way or what have you. you'll be able to pass in a robust set of documents and even if it's on its side like this, it will still be able to go through and extract the various information in that structured format like you saw within the other documents. In terms of trying out LM Whisper within your application, they do have a really robust set of documentation here. They have a Python as well as a JavaScript client. If you want it within a noode builder, you can go ahead and use it within something like NAN as well. And here's just an example of a typical workflow. What you could do is let's say for instance you have a folder where you save out all of your different documents is what you could do at a particular interval. For instance, we could go through and we could have a scheduled trigger. We could read through something like let's say a Google Drive folder and send all of those files within LM Whisper, convert those files to the text representation and then ultimately save those out. That could be within your database or you could even put them within text files adjacent to the image, whatever it might be. Now, I just want to show you how easy it is to get started with leveraging this. I'm just going to set up a boilerplate application and spin up a simple sort of hello world example with LM Whisper completely from scratch. Within here, I have a boilerplate Nex.js application. I'm using Cursor, but you could use any IDE to do this as well. What I'll say within this, I'm going to say I want a sidebar that has a list of different documents. Adjacent to that, I want an upload area where I can add in various documents. And then finally, on the right hand side, once the documents have been processed via a request to an app router endpoint, I want to show that text information from those documents within there. Now, what cursor is doing is it's going to go through, it's going to build out those various components within here. We have a document sidebar. We have an upload area. And then finally, we have the document content area where it's going to be showing that. We also see it has updated the page.tsx. And it's just going through the various functions to get everything working. Now, if I take a look at our applications, we have the documents here. We also have this upload area. And then we have the area on where we can see that different information. Now, we have this simple API route. Next is I'm going to install LM whisper here. So I'm just going to bun install lm whisper client and then from here what we're going to do is we're going to create a bun env file. Within this file we're going to have the lm whisper api key and then we're also going to have the lm whisper base URL. Now we can go over and we can grab our API key and then once you have your API key set up you can grab the URL for the endpoint as well. Now to get started with it there's a few different options. You have the ability to pull the endpoint or you also have this asynchronous method. Now something that cursor is quite good at if you just have the context of all of that documentation and I'll say based on the above context let's set up the endpoint to work for retrieving the text representation of the documents that we upload. I'll go ahead and I'll send that in and we'll wait for those results. Now if I take a look at our application here and I just minimize everything here. The core pieces that we're going to consider ourselves with are the LM Whisperer client v2 here. First, we're going to check whether we have the API key. Then from there, we're going to initialize the LLM whisperer client. Within here, we have the post request endpoint. I just quickly run through this. What we're going to do is once we have the request is we're going to get the form data. It's going to be the file that we concern ourselves with from the request. If there isn't a file, we're just going to return out. From there, we do a little bit of validation on if it's a type of form that we want to process. Within here, I see we have things like txt. So we can clean this up to be the different allowed formats that we can specify within here depending on the tier also that you have set up form whisper. You can go ahead and add different things like images within here as well. In terms of the requests, it's pretty straightforward. This is really the core piece of all that we need. So we're going to specify the file path. We're going to wait for completion. Like I mentioned, there is a polling option as well if you'd like. And then we also have the wait timeout. And then within here, you can specify the mode that you want to use. And basically from there, that's all that we need to do. Now, I want to test this out on an image. I found this document on GitHub. Here we have this handwritten PDF. What I'm going to do is I'm going to go ahead and download this example file here. And then within our application, I'm going to go ahead and upload the PDF here. Then within a handful of seconds here, if I go over to our documents, what I'll see within here is it took that document and it was able to decipher even the cursive writing. Here's the original document. And honestly, I'm even struggling to read this. And if we look at our version, it is we have problem definition. Write a program to transfer the W lock from internal memory to internal memory location. And if I just look over here, as you can imagine, maybe instead of struggling to read different handwriting, you can go ahead and pass things into an application that you build and be able to leverage something like this. This is just yet another quick example both in how you can build as well as leverage LLM Whisperer into a whole array of different applications that you could potentially build with it. But otherwise, that's pretty much it for this video. I encourage you to check out LLM Whisper. If you found this video useful, please comment, share, and subscribe. Otherwise, until the next
Technical content at the intersection of AI and development. Building with AI agents, Claude Code, and modern dev tools - then showing you exactly how it works.