When I was seeding the database for fintechbenchmark.com with Claude-generated or Openai-generated content, I needed structured data that matched my database schema. My format of choice was JSON, and my approach was straightforward:
- Request JSON in the prompt
- Provide an example of the desired structure
- Parse Claude’s response—a text string containing JSON wrapped in markdown code blocks with triple backticks
My shortcut for creating examples? Run the query once without an example and examine what came back. Usually it was exactly what I needed.
It wasn’t elegant. The backtick delimiters occasionally went missing. But it worked reliably enough—until recently.
Enter Structured Outputs
Both Anthropic and OpenAI now offer Structured Outputs—a feature that guarantees schema-compliant JSON responses. Instead of hoping the model formats correctly, you pass a JSON Schema in your API call.
The model uses constrained decoding to mathematically guarantee the response matches your schema. No more parsing failures. No more retry logic. No more crossed fingers.
It’s elegant. It’s reliable. And it’s JSON-only.
The Format Problem
Here’s my dilemma: I need JSON for database updates, but I also need other formats. HTML for showing data to domain experts. CSV for sending to analysts as spreadsheets. Sometimes even formats like DOCX or PPTX.
With the old markdown code block approach, I have one generic parser that works for everything. It handles JSON, HTML, CSV, SQL—anything. One parser, all formats.
Structured Outputs would force me to maintain separate code paths: the new approach for JSON, the old parser for everything else. That’s more complexity, not less.
Why Only JSON?
The technical reason makes sense: constrained decoding works by compiling JSON Schema into a formal grammar that restricts token generation during inference. You can’t easily apply this technique to HTML, CSV, or other formats that lack formal schemas.
But here’s what puzzles me: if models can already surround formatted data with backticks, why not extend response_format to other types? The model already knows how to generate valid HTML, CSV, or XML. Why not guarantee the wrapper stays consistent?
My Decision
I’m sticking with markdown code blocks. The backtick approach is:
- Universal: works across all formats
- Provider-agnostic: same code for Anthropic, OpenAI, anyone
- Simple: one parser, no special cases
- Battle-tested: proven reliable across thousands of extractions
Structured Outputs is a powerful tool when you absolutely need JSON guarantees. But for developers who need flexibility across formats, the old reliable method remains the more practical choice.
Sometimes the solution that works everywhere beats the elegant solution that works nowhere else.
Leave a comment