Prompt Engineering for Structured JSON Output from LLMs

The most frustrating failure mode when integrating LLMs into production code isn't hallucination — it's malformed output. You ask for JSON, you get JSON wrapped in a markdown code fence. You ask for a specific schema, you get extra fields, missing fields, or values of the wrong type. The model was helpful, just not in the way you needed.

Here are the techniques that actually reduce this in practice.

Why LLMs Struggle With Strict Schemas

LLMs are trained to be helpful conversationalists first. When you ask for JSON, the model's instinct is to be extra clear — so it adds a prose introduction, wraps the JSON in triple backticks, and sometimes appends an explanation afterward. None of that is wrong from a communication standpoint. It's just not machine-parseable.

The fix isn't prompting harder. It's removing the ambiguity about what "output" means.

The Output Contract Pattern

Make the schema the last thing in your prompt, and frame it as a contract rather than a suggestion:

const prompt = `You are a content classifier.

TASK: Classify the following blog post excerpt.

POST:
${excerpt}

Return ONLY the following JSON object with no other text, no markdown, no explanation:
{
  "category": "one of: tech | finance | health | other",
  "sentiment": "one of: positive | neutral | negative",
  "readingLevel": "one of: beginner | intermediate | advanced",
  "topKeywords": ["keyword1", "keyword2", "keyword3"]
}`

The phrase "Return ONLY" combined with "no other text, no markdown, no explanation" dramatically reduces wrapper text. The phrase "one of:" on enum fields prevents the model from inventing new categories.

Robust JSON Extraction

Even with tight prompting, you'll occasionally get a preamble. Build extraction into your pipeline rather than assuming clean output:

function extractJson(text: string): unknown {
  // Try parsing directly first
  try {
    return JSON.parse(text.trim())
  } catch {}

  // Extract from markdown code fence
  const fenced = text.match(/```(?:json)?\s*([\s\S]*?)```/)
  if (fenced) {
    try {
      return JSON.parse(fenced[1].trim())
    } catch {}
  }

  // Extract first {...} or [...] block
  const braceMatch = text.match(/\{[\s\S]*\}/)
  const bracketMatch = text.match(/\[[\s\S]*\]/)
  const match = braceMatch ?? bracketMatch
  if (match) {
    try {
      return JSON.parse(match[0])
    } catch {}
  }

  throw new Error(`No valid JSON found in response: ${text.slice(0, 200)}`)
}

This cascade handles the three most common failure modes: no wrapper (ideal), markdown wrapper (common), and embedded JSON with prose around it (occasional).

Schema Validation After Parsing

Parsing JSON doesn't mean the schema is correct. Use Zod to validate the shape before passing it into your application:

import { z } from 'zod'

const ClassificationSchema = z.object({
  category: z.enum(['tech', 'finance', 'health', 'other']),
  sentiment: z.enum(['positive', 'neutral', 'negative']),
  readingLevel: z.enum(['beginner', 'intermediate', 'advanced']),
  topKeywords: z.array(z.string()).min(1).max(5),
})

type Classification = z.infer<typeof ClassificationSchema>

async function classifyPost(excerpt: string): Promise<Classification> {
  const response = await claude.messages.create({ /* ... */ })
  const text = response.content[0].type === 'text' ? response.content[0].text : ''
  const raw = extractJson(text)
  return ClassificationSchema.parse(raw) // throws ZodError if invalid
}

Now your function either returns a correctly-typed Classification or throws with a detailed error telling you exactly which field failed validation. You can catch ZodError and retry, or fall back to a default.

Constrain Enums Explicitly

Vague enum definitions cause drift. Instead of asking for "a sentiment label", enumerate all valid values directly in the prompt and repeat them in the schema:

// ❌ Vague — model might return "mixed", "uncertain", "very positive"
"sentiment": "the sentiment of the text"

// ✅ Constrained — model knows exactly what's valid
"sentiment": "exactly one of: positive | neutral | negative"

For numeric fields, add range constraints:

// ❌ Unconstrained
"confidenceScore": "a number representing confidence"

// ✅ Constrained
"confidenceScore": "integer from 1 to 10, where 10 is highest confidence"

Few-Shot Examples for Complex Schemas

For schemas with more than 4–5 fields or any nested structure, include one or two example inputs and outputs before the task:

const prompt = `You extract metadata from blog posts.

EXAMPLE INPUT:
"TypeScript generics can feel abstract at first. Here's how they pay off in real projects."

EXAMPLE OUTPUT:
{"title":"TypeScript Generics in Practice","readingTime":8,"hasCodeExamples":true,"difficulty":"intermediate"}

---

TASK: Extract metadata from this post.

INPUT:
${postContent}

Return ONLY valid JSON matching this exact structure:
{"title":"string","readingTime":"integer minutes","hasCodeExamples":"boolean","difficulty":"beginner|intermediate|advanced"}`

The example demonstrates the exact output format without spelling out every rule. Models are good at pattern-matching from examples — often better than following long lists of rules.

Retry on Validation Failure

Wire in a retry loop for robustness:

async function withRetry<T>(
  fn: () => Promise<T>,
  maxAttempts = 3
): Promise<T> {
  let lastError: Error | undefined
  for (let i = 0; i < maxAttempts; i++) {
    try {
      return await fn()
    } catch (err) {
      lastError = err as Error
      if (i < maxAttempts - 1) {
        await new Promise(r => setTimeout(r, 500 * (i + 1)))
      }
    }
  }
  throw lastError
}

const result = await withRetry(() => classifyPost(excerpt))

Three attempts with exponential backoff handles transient failures without hammering the API.

Key Takeaways

End your prompt with the exact JSON schema and use "Return ONLY" — wrapper prose drops significantly
Always extract JSON defensively; don't assume clean output even when prompting is tight
Validate with Zod after parsing — parsing succeeds on any valid JSON, validation catches wrong shapes
Enumerate every valid enum value explicitly in the prompt schema
Add few-shot examples for complex or nested schemas