Structured Output 实战：让 AI 大模型稳定输出 JSON 的 7 种方案

在生产环境中调用大模型 API 时，开发者面临的第一个工程难题不是 prompt 写得好不好，而是模型返回的 JSON 能不能被程序可靠解析。根据 Anthropic 2025 年的开发者调研，超过 62% 的 AI 应用 Bug 来自模型输出格式不稳定——多一个逗号、少一个引号、字段类型错误，都可能导致下游系统崩溃。本文将深入对比 7 种让大模型稳定输出 JSON 的技术方案，从原理到代码，从性能到成本，帮你选择最适合生产环境的方案。

🎯 一、为什么让模型输出可靠的 JSON 这么难？

1.1 大模型的「概率性输出」本质

大语言模型（LLM）的核心是逐 token 生成概率分布，然后采样。这意味着模型没有任何「JSON 感知」——它不知道自己在写 JSON，也不理解 JSON 的语法规则。它只是在预测「下一个最可能的 token」。

这带来三个典型问题：

❌ 格式错误：模型可能输出 {name: "test"} 而非 {"name": "test"}（缺少引号）
❌ 类型错误：字段定义要求 number，模型却输出 "123"（字符串）
❌ 幻觉字段：模型自作主张添加了 Schema 中没有的字段

⚠️ **警告：**在金融、医疗等对数据完整性要求极高的场景中，直接使用模型原始输出而不做格式校验，是严重的工程事故。永远不要假设模型会 100% 遵循你的格式要求。

1.2 七种方案速览

下表是本文将详细分析的 7 种方案的核心对比：

方案	原理	可靠性	延迟开销	成本影响	适用场景
① Prompt 约束	在 prompt 中要求输出 JSON	⭐⭐	无	无	快速原型
② Few-shot 示例	提供 JSON 示例引导输出	⭐⭐⭐	无	+token 消耗	简单结构
③ Function Calling	利用模型原生工具调用能力	⭐⭐⭐⭐	低	+工具定义 token	需要调用函数
④ JSON Mode	模型 API 原生 JSON 输出模式	⭐⭐⭐⭐	低	无额外	纯 JSON 输出
⑤ Constrained Decoding	在 token 采样时强制 JSON 语法	⭐⭐⭐⭐⭐	中	无	高可靠性需求
⑥ 结构化 Schema	使用 JSON Schema + 响应格式	⭐⭐⭐⭐⭐	低	+Schema token	生产环境
⑦ 后处理+重试	解析失败时自动重试	⭐⭐⭐⭐	高（重试时）	+重试 token	兜底方案

💡 提示：实际生产中，最佳实践是组合使用多种方案——比如 Schema 约束 + 后处理重试，而不是只依赖单一方案。

🔧 二、方案实战：从简单到高级

2.1 方案①②：Prompt 约束与 Few-shot 示例

最基础的方式是在 prompt 中明确要求模型输出 JSON，并提供示例。这是所有方案的起点。

// ❌ 错误写法：模糊的 prompt，模型可能输出任何格式
const badPrompt = "帮我分析这段文本的情感";

// ✅ 正确写法：明确指定 JSON 格式 + Few-shot 示例
const goodPrompt = `
分析以下文本的情感倾向，严格以 JSON 格式返回。

示例输入："这个产品太好用了，强烈推荐！"
示例输出：
{
  "sentiment": "positive",
  "confidence": 0.95,
  "keywords": ["好用", "推荐"]
}

现在分析以下文本："质量一般，价格偏高，但服务态度不错"
严格只输出 JSON，不要输出其他内容。
`;

这种方式的问题是可靠性不够。OpenAI 的测试数据显示，纯 prompt 约束下 JSON 输出的格式正确率约为 85-92%，意味着每 10 次调用就有 1 次可能出错。

2.2 方案③：Function Calling

Function Calling 是 OpenAI 在 2023 年推出的方案，本质是利用模型的工具调用能力来约束输出格式。你定义一个「假函数」，让模型把结果填入函数参数中。

// 使用 OpenAI Function Calling 约束 JSON 输出
import OpenAI from "openai";

const client = new OpenAI();

async function analyzeSentiment(text) {
  const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [
      { role: "user", content: `分析情感：${text}` }
    ],
    tools: [{
      type: "function",
      function: {
        name: "report_sentiment",
        description: "报告文本情感分析结果",
        parameters: {
          type: "object",
          properties: {
            sentiment: {
              type: "string",
              enum: ["positive", "negative", "neutral", "mixed"],
              description: "情感倾向"
            },
            confidence: {
              type: "number",
              minimum: 0,
              maximum: 1,
              description: "置信度"
            },
            keywords: {
              type: "array",
              items: { type: "string" },
              description: "情感关键词"
            }
          },
          required: ["sentiment", "confidence", "keywords"]
        }
      }
    }],
    tool_choice: {
      type: "function",
      function: { name: "report_sentiment" }
    }
  });

  // 解析函数参数作为结构化输出
  const toolCall = response.choices[0].message.tool_calls[0];
  return JSON.parse(toolCall.function.arguments);
}

// 使用示例
const result = await analyzeSentiment("质量一般，价格偏高，但服务态度不错");
console.log(result);
// { sentiment: "mixed", confidence: 0.82, keywords: ["一般", "偏高", "不错"] }

Function Calling 的可靠性约为 95-98%，因为它利用了模型在 RLHF 阶段专门训练的工具调用能力。但它有一个隐含成本：工具定义会占用大量 token。一个复杂 Schema 可能消耗 500-1000 个 token，在高频调用场景下成本不可忽视。

2.3 方案④⑥：JSON Mode 与结构化 Schema

OpenAI 的 response_format 参数是目前最主流的方案。json_object 模式保证输出是合法 JSON，json_schema 模式则进一步保证符合你定义的 Schema。

// 使用 OpenAI Structured Outputs（推荐生产方案）
import OpenAI from "openai";
import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";

const client = new OpenAI();

// 定义 Zod Schema（自动转换为 JSON Schema）
const SentimentSchema = z.object({
  sentiment: z.enum(["positive", "negative", "neutral", "mixed"]),
  confidence: z.number().min(0).max(1),
  keywords: z.array(z.string()).min(1).max(10),
  summary: z.string().max(200),
  emotions: z.object({
    joy: z.number().min(0).max(1),
    anger: z.number().min(0).max(1),
    sadness: z.number().min(0).max(1),
    surprise: z.number().min(0).max(1)
  })
});

async function analyzeWithSchema(text) {
  const response = await client.chat.completions.create({
    model: "gpt-4o-2024-08-06",
    messages: [
      {
        role: "system",
        content: "你是情感分析专家，分析用户输入的文本情感。"
      },
      { role: "user", content: text }
    ],
    response_format: zodResponseFormat(SentimentSchema, "sentiment")
  });

  // 直接解析，无需额外校验——Schema 已经保证了格式
  return JSON.parse(response.choices[0].message.content);
}

📌 记住：OpenAI 的 Structured Outputs（json_schema 模式）使用了 Constrained Decoding 技术，在底层强制模型只能生成符合 Schema 的 token。这意味着格式正确率是 100%，而不是「大概率正确」。

2.4 方案⑤：Constrained Decoding 原理与本地实现

Constrained Decoding（受约束解码）是最可靠的方案。它的原理是：在每一步 token 采样时，根据 JSON 语法动态修改 token 概率分布，将不符合语法的 token 概率设为 0。

# 使用 Outlines 库实现 Constrained Decoding（本地模型）
import outlines
import json

# 加载本地模型
model = outlines.models.transformers("Qwen/Qwen2.5-7B-Instruct")

# 定义 JSON Schema
schema = {
    "type": "object",
    "properties": {
        "product": {"type": "string"},
        "price": {"type": "number", "minimum": 0},
        "in_stock": {"type": "boolean"},
        "tags": {
            "type": "array",
            "items": {"type": "string"},
            "maxItems": 5
        }
    },
    "required": ["product", "price", "in_stock", "tags"]
}

# 使用 JSON Schema 约束生成
generator = outlines.generate.json(model, schema)
result = generator(
    "从以下文本中提取产品信息：新款 iPhone 16 Pro 售价 7999 元，"
    "目前有货，属于手机、数码、苹果产品类目"
)

print(json.dumps(result, ensure_ascii=False, indent=2))
# {
#   "product": "iPhone 16 Pro",
#   "price": 7999,
#   "in_stock": true,
#   "tags": ["手机", "数码", "苹果产品"]
# }

Constrained Decoding 的可靠性是 100%——只要模型能输出有效 token，结果就一定符合 Schema。但代价是推理速度会降低 10-30%，因为每一步都需要计算合法 token 集合。

以下是主流 Constrained Decoding 方案的对比：

方案	语言	支持模型	性能开销	社区活跃度
Outlines	Python	HuggingFace, vLLM, llama.cpp	10-20%	⭐⭐⭐⭐⭐
Guidance	Python	Azure OpenAI, Transformers	15-30%	⭐⭐⭐⭐
LMQL	Python	多种后端	15-25%	⭐⭐⭐
SGLang	Python	vLLM, 本地模型	5-15%	⭐⭐⭐⭐
OpenAI API	-	GPT-4o 系列	内置优化	⭐⭐⭐⭐⭐

💡 **提示：**如果你使用 OpenAI 或 Anthropic 的 API，直接用它们内置的 Structured Output 功能即可，无需引入额外的 Constrained Decoding 库。本地部署模型才需要 Outlines 等工具。

2.5 方案⑦：后处理与重试策略

无论使用哪种方案，都需要一个兜底的重试机制。这是生产环境的最后一道防线。

// 生产级 Structured Output 调用封装（TypeScript）
import OpenAI from "openai";
import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";

interface StructuredCallOptions<T> {
  model?: string;
  maxRetries?: number;
  retryDelay?: number;
  fallbackValue?: T;
  onRetry?: (attempt: number, error: Error) => void;
}

async function callWithSchema<T extends z.ZodType>(
  schema: T,
  messages: OpenAI.ChatCompletionMessageParam[],
  options: StructuredCallOptions<z.infer<T>> = {}
): Promise<{ data: z.infer<T>; attempts: number }> {
  const {
    model = "gpt-4o-2024-08-06",
    maxRetries = 3,
    retryDelay = 1000,
    fallbackValue,
    onRetry
  } = options;

  const client = new OpenAI();
  let lastError: Error | null = null;

  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const response = await client.chat.completions.create({
        model,
        messages,
        response_format: zodResponseFormat(schema, "result"),
        temperature: 0.1  // 低温 = 更稳定的输出
      });

      const content = response.choices[0].message.content;
      if (!content) throw new Error("Empty response from model");

      const parsed = JSON.parse(content);
      const validated = schema.parse(parsed);  // Zod 校验

      return { data: validated, attempts: attempt };
    } catch (err) {
      lastError = err as Error;
      if (onRetry) onRetry(attempt, lastError);
      if (attempt < maxRetries) {
        // 指数退避
        await new Promise(r => setTimeout(r, retryDelay * attempt));
      }
    }
  }

  // 所有重试都失败，返回 fallback 或抛出错误
  if (fallbackValue !== undefined) {
    return { data: fallbackValue, attempts: maxRetries };
  }
  throw new Error(
    `Structured output failed after ${maxRetries} attempts: ${lastError?.message}`
  );
}

// 使用示例
const ProductSchema = z.object({
  name: z.string(),
  price: z.number().positive(),
  category: z.string(),
  inStock: z.boolean()
});

const { data, attempts } = await callWithSchema(
  ProductSchema,
  [{ role: "user", content: "提取产品信息：MacBook Pro 14寸，售价 14999 元，现货" }],
  {
    maxRetries: 3,
    temperature: 0.0,  // 确定性输出
    onRetry: (n, err) => console.warn(`第 ${n} 次重试: ${err.message}`)
  }
);

📊 三、生产环境的最佳实践

3.1 方案选型决策树

根据你的具体场景，选择最合适的方案：

✅ 使用 OpenAI/Claude API + 需要 100% 可靠 → Structured Outputs（Schema 模式）
✅ 使用 OpenAI/Claude API + 结构简单 → Function Calling 即可
✅ 本地部署模型 → Outlines / SGLang + Constrained Decoding
✅ 多模型兼容 → Prompt 约束 + Zod 校验 + 重试
❌ 不要只用 prompt 约束 → 生产环境不可接受
❌ 不要跳过输出校验 → 即使用了 Schema，也要做业务逻辑校验

3.2 性能与成本优化

在高频调用场景下，Schema 复杂度直接影响成本和延迟：

Schema 复杂度	Token 消耗（约）	每百万次调用额外成本	延迟影响
简单（3-5 字段）	+100-200 token	$0.50-1.00	+50ms
中等（10-15 字段）	+300-500 token	$1.50-2.50	+100ms
复杂（嵌套+数组）	+500-1000 token	$2.50-5.00	+150ms

⚠️ **警告：**不要为了「完美」而设计过于复杂的 Schema。一个包含 20+ 字段的 Schema 不仅增加成本，还会提高模型出错的概率。优先拆分为多次简单调用。

3.3 避坑指南

在生产环境中踩过的坑，分享给读者：

坑 1：null 值处理

模型经常在不确定时返回 null，但你的 Schema 可能没有定义 nullable。务必在 Zod 中使用 z.string().nullable() 或 z.string().optional()。

坑 2：数字精度问题

模型输出的数字可能是 7999.0 而非 7999，或者用科学计数法 7.999e3。在需要精确数字的场景（如金融），建议用 z.string() 接收后再手动转换。

坑 3：枚举值「创造力」

即使你定义了 enum: ["positive", "negative"]，某些模型仍然可能输出 "Positive" 或 "积极"。建议在 Schema 中用 z.enum() 并在 system prompt 中重复强调允许的值。

坑 4：数组长度失控

模型可能返回包含 100 个元素的数组。使用 z.array(schema).max(10) 来限制。

// ❌ 错误：没有限制数组长度
const BadSchema = z.object({
  keywords: z.array(z.string())  // 模型可能返回 50 个关键词
});

// ✅ 正确：限制数组长度和字符串长度
const GoodSchema = z.object({
  keywords: z.array(z.string().max(20)).min(1).max(10),  // 1-10 个，每个最多 20 字符
  summary: z.string().min(10).max(500)
});

3.4 各大模型 Structured Output 支持情况

模型/平台	JSON Mode	Schema 约束	Constrained Decoding	可靠性
OpenAI GPT-4o	✅	✅（原生）	✅（内置）	100%
Claude 3.5/4	✅	✅（Tool Use）	❌	~98%
Gemini 2.0	✅	✅（原生）	❌	~97%
Qwen 2.5	✅	❌（需 Outlines）	✅（Outlines）	100%
Llama 3.3	❌（需 Outlines）	❌（需 Outlines）	✅（Outlines）	100%
DeepSeek V3	✅	✅（原生）	❌	~96%

🔐 四、跨语言 SDK 封装实战

4.1 Python 版本：Pydantic + Instructor

# 使用 Instructor 库实现类型安全的 Structured Output
from pydantic import BaseModel, Field
from typing import List
import instructor
from openai import OpenAI

# Instructor 自动处理重试和校验
client = instructor.from_openai(OpenAI())

class SentimentResult(BaseModel):
    """情感分析结果"""
    sentiment: str = Field(description="情感倾向", pattern="^(positive|negative|neutral|mixed)$")
    confidence: float = Field(ge=0, le=1, description="置信度")
    keywords: List[str] = Field(min_length=1, max_length=10, description="关键词")

def analyze(text: str) -> SentimentResult:
    return client.chat.completions.create(
        model="gpt-4o",
        response_model=SentimentResult,  # 自动校验 + 重试
        max_retries=3,
        messages=[
            {"role": "user", "content": f"分析情感：{text}"}
        ]
    )

result = analyze("这个产品太棒了，强烈推荐！")
print(result.sentiment)   # "positive"
print(result.confidence)  # 0.95
print(type(result))       # <class 'SentimentResult'> — 不是 dict，是 Pydantic 对象

4.2 Java 版本：Spring AI + Bean Validation

// Spring AI 的 Structured Output 支持（Spring Boot 3.3+）
import org.springframework.ai.chat.model.ChatModel;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.converter.BeanOutputConverter;

// 定义 DTO（带 Bean Validation 注解）
public record SentimentResult(
    @NotBlank String sentiment,
    @DecimalMin("0") @DecimalMax("1") double confidence,
    @Size(min = 1, max = 10) List<String> keywords
) {}

// 调用
@BeanOutputConverter(SentimentResult.class)
public SentimentResult analyze(ChatModel chatModel, String text) {
    var prompt = new Prompt("分析情感：" + text);
    var response = chatModel.call(prompt);
    return converter.convert(response.getResult().getOutput().getContent());
}

💡 总结与建议

选择 Structured Output 方案的核心建议：

OpenAI 用户首选 Schema 模式——100% 格式保证，零额外成本，直接用
Claude 用户用 Tool Use 做约束——比 JSON Mode 更可靠
本地部署用 Outlines——开源、高性能、支持主流模型
所有方案都要加 Zod/Pydantic 校验——Schema 保证格式，校验保证业务正确性
生产环境必须有重试机制——即使可靠性 99%，1% 的失败在高频调用下也会累积

⚡ **关键结论：**2026 年的 Structured Output 技术已经成熟，不再是「可能有效」而是「工程可靠」。选择合适的方案，可以让你的 AI 应用从「Demo 级」跃升到「生产级」。

🔗 相关工具推荐

OpenAI Structured Outputs 文档 — 官方文档，最权威
Instructor（Python） — 最流行的 Python Structured Output 库
Outlines — 本地模型 Constrained Decoding 首选
Zod — TypeScript 运行时类型校验库
jsjson.com JSON 格式化工具 — 格式化和校验你的 AI 输出 JSON