LLM Engineering

Structured Output Extraction

Turn messy text into reliable typed data with an LLM: a strict schema, tool or JSON mode, validation, and a repair loop that never ships bad shapes.

llmstructured-outputjsonextractionvalidation

1 file

Description

Turn messy text into reliable typed data with an LLM: a strict schema, tool or JSON mode, validation, and a repair loop that never ships bad shapes.

Install

SKILL.md

name: structured-output-extraction description: Use when extracting structured, typed data from unstructured text with an LLM (parsing documents, classifying, pulling fields) and the downstream code needs a guaranteed shape.

Structured Output Extraction

When to trigger

The task is "read this text and give me these fields as data": parsing an email into a ticket, pulling line items from an invoice, classifying a message into an enum.

Do not parse free text

Asking for JSON in the prompt and then parsing the reply is the fragile path. The model wraps it in prose, adds a trailing comment, or uses the wrong key. Instead:

Use the provider's structured-output or tool-calling mode so the model is constrained to your schema.
Define the schema once and share it between the model call and your validator (for example a zod schema you also convert to the tool input schema).

Steps

Write the schema as the source of truth: required fields required, enums for closed sets, descriptions on ambiguous fields so the model fills them correctly.
Call the model in structured mode with that schema.
Validate the result against the same schema. A validation failure is a signal, not a surprise.
On failure, run one repair pass: send the model its own invalid output and the validation error and ask it to fix the shape. Cap repairs at one or two, then fall back.

Pitfalls

Enums drift: the model returns "In Progress" when the schema wants "in_progress". Put the exact allowed values in the field description and validate.
Optional vs required confusion produces nulls the downstream code did not expect. Make required fields required in the schema, not just in the prompt.
An unbounded repair loop can spin on genuinely unparseable input. Cap the retries and route the leftover to a human or a fallback record.

Refusal and Fallback Handling

Added 2026-07-01. Back to the Skill Library.

Get Smarter About AI Dev

New tutorials, open-source projects, and deep dives on coding agents - delivered weekly.

One email per weekReal code, not theoryFree forever

Do not parse free text

Asking for JSON in the prompt and then parsing the reply is the fragile path. The model wraps it in prose, adds a trailing comment, or uses the wrong key. Instead:

Use the provider's structured-output or tool-calling mode so the model is constrained to your schema.

Define the schema once and share it between the model call and your validator (for example a zod schema you also convert to the tool input schema).

Steps

Write the schema as the source of truth: required fields required, enums for closed sets, descriptions on ambiguous fields so the model fills them correctly.

Call the model in structured mode with that schema.

Validate the result against the same schema. A validation failure is a signal, not a surprise.

On failure, run one repair pass: send the model its own invalid output and the validation error and ask it to fix the shape. Cap repairs at one or two, then fall back.

Pitfalls

Enums drift: the model returns "In Progress" when the schema wants "in_progress". Put the exact allowed values in the field description and validate.

Optional vs required confusion produces nulls the downstream code did not expect. Make required fields required in the schema, not just in the prompt.

An unbounded repair loop can spin on genuinely unparseable input. Cap the retries and route the leftover to a human or a fallback record.