API 限流实战：滑动窗口、令牌桶与分布式限流的完整实现

API 限流（Rate Limiting）是后端开发中最容易被忽视、但出问题时最致命的基础设施。根据 Datadog 2025 年的事故报告，超过 23% 的线上服务中断与流量突增直接相关，而其中大部分可以通过合理的限流策略避免。更讽刺的是，很多开发者花了数周优化数据库查询和缓存策略，却在限流上用了最简单的 setTimeout 或者干脆没有限流——直到某天凌晨被一个脚本小子的暴力请求打崩了服务。

本文不是算法教科书的搬运。我将从实际生产经验出发，手把手教你实现四种限流算法，对比它们在不同场景下的表现差异，并提供 Node.js、Redis、Nginx 三层限流的完整代码。无论你是构建公开 API、保护内部微服务，还是在边缘节点做流量管控，这篇文章都能给你一套可直接复用的方案。

🔐 一、四大限流算法原理与实现

1.1 固定窗口（Fixed Window）——简单但有致命缺陷

固定窗口是最直观的限流方案：将时间划分为固定长度的窗口（如每分钟），在窗口内计数请求数量，超过阈值就拒绝。

// 固定窗口限流器 — 最简单的实现
class FixedWindowRateLimiter {
  constructor(maxRequests, windowMs) {
    this.maxRequests = maxRequests;
    this.windowMs = windowMs;
    this.counters = new Map(); // key → { count, windowStart }
  }

  isAllowed(key) {
    const now = Date.now();
    const current = this.counters.get(key);

    // 如果没有记录，或者已经进入新的窗口
    if (!current || now - current.windowStart >= this.windowMs) {
      this.counters.set(key, { count: 1, windowStart: now });
      return { allowed: true, remaining: this.maxRequests - 1 };
    }

    // 在当前窗口内
    if (current.count < this.maxRequests) {
      current.count++;
      return { allowed: true, remaining: this.maxRequests - current.count };
    }

    // 超过限制
    const retryAfter = this.windowMs - (now - current.windowStart);
    return { allowed: false, remaining: 0, retryAfter };
  }
}

// 使用示例
const limiter = new FixedWindowRateLimiter(100, 60000); // 每分钟 100 次
const result = limiter.isAllowed('user:123');
console.log(result); // { allowed: true, remaining: 99 }

固定窗口的致命问题在于窗口边界突发（Boundary Burst）：如果用户在窗口 A 的最后 1 秒发了 100 个请求，然后在窗口 B 的第 1 秒又发了 100 个请求——在 2 秒内实际处理了 200 个请求，是限制的两倍。

⚠️ **警告：**固定窗口只适合对精度要求不高的场景（如日志采集的采样限制）。对于 API 限流，至少应该使用滑动窗口。

1.2 滑动窗口（Sliding Window Log）——精确但内存开销大

滑动窗口日志（Sliding Window Log）记录每个请求的时间戳，窗口随时间滑动，始终检查「过去 N 秒内」的请求数。

// 滑动窗口日志限流器 — 精确计数
class SlidingWindowLogLimiter {
  constructor(maxRequests, windowMs) {
    this.maxRequests = maxRequests;
    this.windowMs = windowMs;
    this.logs = new Map(); // key → [timestamp1, timestamp2, ...]
  }

  isAllowed(key) {
    const now = Date.now();
    const windowStart = now - this.windowMs;

    // 获取或初始化日志
    if (!this.logs.has(key)) {
      this.logs.set(key, []);
    }
    const timestamps = this.logs.get(key);

    // 清除过期的时间戳（窗口滑动）
    while (timestamps.length > 0 && timestamps[0] <= windowStart) {
      timestamps.shift();
    }

    // 检查当前窗口内的请求数
    if (timestamps.length < this.maxRequests) {
      timestamps.push(now);
      return {
        allowed: true,
        remaining: this.maxRequests - timestamps.length,
        total: this.maxRequests
      };
    }

    // 超过限制 — 计算需要等待的时间
    const retryAfter = timestamps[0] + this.windowMs - now;
    return { allowed: false, remaining: 0, retryAfter };
  }
}

滑动窗口日志的精度最高，但内存开销与请求数成正比。如果限制是每分钟 10000 次，你需要为每个 key 存储最多 10000 个时间戳。在高并发场景下，这会消耗大量内存。

1.3 滑动窗口计数器（Sliding Window Counter）——精度与性能的最佳平衡

滑动窗口计数器（Sliding Window Counter）是生产环境中最常用的方案。它结合了固定窗口的低内存开销和滑动窗口的高精度，通过加权计算两个相邻窗口的请求数来估算当前窗口的实际请求数。

// 滑动窗口计数器 — 生产级实现
class SlidingWindowCounterLimiter {
  constructor(maxRequests, windowMs) {
    this.maxRequests = maxRequests;
    this.windowMs = windowMs;
    // 每个 key 存储：{ prevCount, currCount, currWindowStart }
    this.windows = new Map();
  }

  isAllowed(key) {
    const now = Date.now();
    const windowStart = Math.floor(now / this.windowMs) * this.windowMs;

    if (!this.windows.has(key)) {
      this.windows.set(key, { prevCount: 0, currCount: 0, currWindowStart: windowStart });
    }

    const win = this.windows.get(key);

    // 进入新窗口：当前窗口变为上一个窗口
    if (windowStart > win.currWindowStart) {
      const elapsed = windowStart - win.currWindowStart;
      if (elapsed >= this.windowMs * 2) {
        // 跨了两个以上窗口，全部清零
        win.prevCount = 0;
      } else {
        win.prevCount = win.currCount;
      }
      win.currCount = 0;
      win.currWindowStart = windowStart;
    }

    // 加权计算：上一个窗口的剩余比例 + 当前窗口的计数
    const timeInWindow = now - windowStart;
    const weight = 1 - (timeInWindow / this.windowMs);
    const estimatedCount = win.prevCount * weight + win.currCount;

    if (estimatedCount < this.maxRequests) {
      win.currCount++;
      return {
        allowed: true,
        remaining: Math.floor(this.maxRequests - estimatedCount - 1),
        estimated: Math.ceil(estimatedCount) + 1
      };
    }

    const retryAfter = this.windowMs - timeInWindow;
    return { allowed: false, remaining: 0, retryAfter };
  }
}

// 使用示例
const limiter = new SlidingWindowCounterLimiter(100, 60000);
for (let i = 0; i < 105; i++) {
  const result = limiter.isAllowed('api-key:abc');
  if (!result.allowed) {
    console.log(`第 ${i + 1} 个请求被拒绝，需等待 ${result.retryAfter}ms`);
    break;
  }
}
// 输出：第 101 个请求被拒绝，需等待 xxxms

💡 **提示：**Cloudflare、Stripe、GitHub 等平台的 API 限流都基于滑动窗口计数器实现。它是精度、内存和性能三者的最佳平衡点。

1.4 令牌桶（Token Bucket）——允许突发流量

令牌桶算法以固定速率向桶中添加令牌，每个请求消耗一个令牌。桶有最大容量，允许一定程度的突发流量（桶满时的存量令牌）。

// 令牌桶限流器 — 支持突发流量
class TokenBucketRateLimiter {
  constructor(maxTokens, refillRate, refillIntervalMs) {
    this.maxTokens = maxTokens;
    this.refillRate = refillRate;          // 每次补充的令牌数
    this.refillIntervalMs = refillIntervalMs; // 补充间隔
    this.buckets = new Map(); // key → { tokens, lastRefill }
  }

  isAllowed(key, tokensNeeded = 1) {
    const now = Date.now();

    if (!this.buckets.has(key)) {
      this.buckets.set(key, { tokens: this.maxTokens, lastRefill: now });
    }

    const bucket = this.buckets.get(key);

    // 计算应该补充的令牌数
    const elapsed = now - bucket.lastRefill;
    const refillCount = Math.floor(elapsed / this.refillIntervalMs) * this.refillRate;
    bucket.tokens = Math.min(this.maxTokens, bucket.tokens + refillCount);
    bucket.lastRefill = now;

    // 尝试消费令牌
    if (bucket.tokens >= tokensNeeded) {
      bucket.tokens -= tokensNeeded;
      return {
        allowed: true,
        remaining: bucket.tokens,
        tokens: tokensNeeded
      };
    }

    // 令牌不足 — 计算需要等待的时间
    const deficit = tokensNeeded - bucket.tokens;
    const waitTime = Math.ceil(deficit / this.refillRate) * this.refillIntervalMs;
    return { allowed: false, remaining: 0, retryAfter: waitTime };
  }
}

// 每秒补充 10 个令牌，桶容量 100（允许 10 秒的突发）
const limiter = new TokenBucketRateLimiter(100, 10, 1000);

// 模拟突发流量
for (let i = 0; i < 110; i++) {
  const result = limiter.isAllowed('user:456');
  if (!result.allowed) {
    console.log(`第 ${i + 1} 个请求被拒绝，剩余令牌: ${result.remaining}`);
    break;
  }
}

⚡ **关键结论：**令牌桶适合需要「突发容忍」的场景——比如用户短时间内快速点击按钮，但长期来看需要控制速率。漏桶（Leaky Bucket）则相反，它以恒定速率处理请求，完全平滑流量，适合对下游服务保护要求极高的场景。

📊 二、算法对比与选型指南

选择合适的限流算法需要考虑精度、内存、突发容忍度和实现复杂度四个维度：

算法	精度	内存开销	突发容忍	实现复杂度	推荐场景
固定窗口	⭐⭐	⭐⭐⭐⭐⭐ 极低	❌ 窗口边界突发	⭐⭐⭐⭐⭐ 最简单	日志采样、非关键限流
滑动窗口日志	⭐⭐⭐⭐⭐ 最高	⭐ 极高（O(n)）	✅ 精确控制	⭐⭐⭐ 中等	低流量、高精度场景
滑动窗口计数器	⭐⭐⭐⭐ 高	⭐⭐⭐⭐ 低（O(1)）	⚠️ 轻微容忍	⭐⭐⭐⭐ 简单	✅ 大多数 API 场景首选
令牌桶	⭐⭐⭐ 中	⭐⭐⭐⭐ 低（O(1)）	✅ 明确的突发容量	⭐⭐⭐ 中等	需要突发容忍的 API
漏桶	⭐⭐⭐⭐ 高	⭐⭐⭐⭐ 低（O(1)）	❌ 完全平滑	⭐⭐⭐ 中等	下游服务保护

📌 **记住：**没有「最好」的限流算法，只有「最适合」的。公开 API 推荐滑动窗口计数器（精度高、内存低）；内部微服务间推荐令牌桶（允许突发）；流量整形推荐漏桶（平滑输出）。

🚀 三、生产级分布式限流方案

单机限流在分布式环境中毫无意义——你的服务有 10 个实例，每个实例限流 100 次/分钟，用户实际可以发 1000 次/分钟。分布式限流的核心是共享状态，Redis 是最常用的方案。

3.1 基于 Redis 的滑动窗口限流

// Redis 滑动窗口限流 — 使用 Sorted Set
// 每个请求作为一个 member，时间戳作为 score
const Redis = require('ioredis');
const redis = new Redis();

async function slidingWindowRateLimit(key, maxRequests, windowMs) {
  const now = Date.now();
  const windowStart = now - windowMs;
  const redisKey = `ratelimit:${key}`;

  // Lua 脚本保证原子性
  const luaScript = `
    local key = KEYS[1]
    local windowStart = tonumber(ARGV[1])
    local now = tonumber(ARGV[2])
    local maxRequests = tonumber(ARGV[3])
    local windowMs = tonumber(ARGV[4])

    -- 移除窗口外的旧请求
    redis.call('ZREMRANGEBYSCORE', key, '-inf', windowStart)

    -- 获取当前窗口内的请求数
    local count = redis.call('ZCARD', key)

    if count < maxRequests then
      -- 允许请求：添加当前时间戳
      redis.call('ZADD', key, now, now .. '-' .. math.random(1000000))
      redis.call('PEXPIRE', key, windowMs)
      return {1, maxRequests - count - 1}
    else
      -- 拒绝请求：获取最早的请求时间戳来计算 retryAfter
      local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES')
      local retryAfter = tonumber(oldest[2]) + windowMs - now
      return {0, 0, retryAfter}
    end
  `;

  const result = await redis.eval(
    luaScript, 1, redisKey,
    windowStart.toString(), now.toString(),
    maxRequests.toString(), windowMs.toString()
  );

  return {
    allowed: result[0] === 1,
    remaining: result[1],
    retryAfter: result[2] || 0
  };
}

// 使用示例 — Express 中间件
function rateLimitMiddleware(maxRequests, windowMs) {
  return async (req, res, next) => {
    const key = req.ip; // 或 req.headers['x-api-key']
    const result = await slidingWindowRateLimit(key, maxRequests, windowMs);

    // 设置标准限流响应头
    res.set({
      'X-RateLimit-Limit': maxRequests,
      'X-RateLimit-Remaining': result.remaining,
      'X-RateLimit-Reset': Math.ceil((Date.now() + (result.retryAfter || windowMs)) / 1000)
    });

    if (!result.allowed) {
      res.set('Retry-After', Math.ceil(result.retryAfter / 1000));
      return res.status(429).json({
        error: 'Too Many Requests',
        message: `Rate limit exceeded. Retry after ${Math.ceil(result.retryAfter / 1000)} seconds.`
      });
    }

    next();
  };
}

// 应用限流中间件
app.use('/api/', rateLimitMiddleware(100, 60000)); // 每分钟 100 次
app.use('/api/auth/', rateLimitMiddleware(10, 60000)); // 认证接口更严格

⚠️ 警告：Redis 限流的关键是原子性。不要用 GET + 判断 + SET 的三步操作——在高并发下会出现竞态条件，导致实际放行的请求数超过限制。上面的 Lua 脚本保证了整个操作在 Redis 中原子执行。

3.2 滑动窗口计数器的 Redis 实现（更高效）

Sorted Set 方案在高并发下内存开销较大（每个请求一个 member）。下面用 Redis Hash 实现滑动窗口计数器，内存开销降低 100 倍：

// Redis 滑动窗口计数器 — 更高效的实现
async function slidingWindowCounterLimit(key, maxRequests, windowMs) {
  const now = Date.now();
  const currentWindow = Math.floor(now / windowMs) * windowMs;
  const previousWindow = currentWindow - windowMs;
  const timeInWindow = now - currentWindow;
  const weight = 1 - (timeInWindow / windowMs);

  const luaScript = `
    local key = KEYS[1]
    local previousWindow = ARGV[1]
    local currentWindow = ARGV[2]
    local maxRequests = tonumber(ARGV[3])
    local windowMs = tonumber(ARGV[4])
    local weight = tonumber(ARGV[5])

    -- 获取前一个窗口和当前窗口的计数
    local prevCount = tonumber(redis.call('HGET', key, previousWindow) or 0)
    local currCount = tonumber(redis.call('HGET', key, currentWindow) or 0)

    -- 加权计算
    local estimated = prevCount * weight + currCount

    if estimated < maxRequests then
      -- 原子递增当前窗口计数
      redis.call('HINCRBY', key, currentWindow, 1)
      -- 设置过期时间（清理旧窗口）
      redis.call('PEXPIRE', key, windowMs * 2)
      return {1, math.floor(maxRequests - estimated - 1), math.ceil(estimated) + 1}
    else
      local retryAfter = windowMs - (tonumber(ARGV[6]) - currentWindow)
      return {0, 0, retryAfter}
    end
  `;

  const result = await redis.eval(
    luaScript, 1, `ratelimit:counter:${key}`,
    previousWindow.toString(), currentWindow.toString(),
    maxRequests.toString(), windowMs.toString(),
    weight.toString(), now.toString()
  );

  return {
    allowed: result[0] === 1,
    remaining: result[1],
    estimated: result[2],
    retryAfter: result[3] || 0
  };
}

3.3 多层限流架构

生产环境中，限流应该分层实施——边缘层做粗粒度过滤，应用层做细粒度控制：

# Nginx 层限流 — 粗粒度，保护后端
# nginx.conf

http {
    # 定义限流区域：每 IP 每秒 10 个请求
    limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;

    # 定义限流区域：每 API Key 每分钟 1000 个请求
    map $http_x_api_key $api_key {
        default $http_x_api_key;
        ""      $binary_remote_addr;  # 无 API Key 时用 IP
    }
    limit_req_zone $api_key zone=apikey:10m rate=1000r/m;

    server {
        location /api/ {
            # 粗粒度：每 IP 限流，突发允许 20 个请求排队
            limit_req zone=api burst=20 nodelay;

            # 429 响应头
            limit_req_status 429;

            proxy_pass http://backend;
        }

        location /api/v1/ {
            # 细粒度：每 API Key 限流
            limit_req zone=apikey burst=50 nodelay;
            proxy_pass http://backend;
        }
    }
}

三层限流架构的效果：

层级	位置	粒度	响应时间	作用
CDN/边缘	Cloudflare/Lambda@Edge	IP 级，粗粒度	<5ms	抵御 DDoS 和暴力扫描
Nginx/网关	反向代理层	IP + API Key	<10ms	过滤大部分超频请求
应用层	Node.js/Java 中间件	用户 + 端点 + 业务规则	<50ms	精细化业务限流

⚡ **关键结论：**限流越靠近用户（边缘）越好。在 CDN 层拒绝一个请求的成本是 0.001ms，在应用层拒绝的成本是 50ms——差了 5 万倍。永远不要把限流全部留给应用层。

💡 四、限流响应设计与最佳实践

4.1 标准限流响应头

API 限流的响应头有事实标准（虽然没有正式 RFC），所有主流 API 都遵循这套规范：

// Express 限流响应头中间件
function rateLimitHeaders(limit, remaining, resetTimestamp) {
  return (req, res, next) => {
    res.set({
      // 当前时间窗口的总配额
      'X-RateLimit-Limit': limit,
      // 当前窗口剩余配额
      'X-RateLimit-Remaining': remaining,
      // 配额重置的 Unix 时间戳（秒）
      'X-RateLimit-Reset': Math.ceil(resetTimestamp / 1000),
    });
    next();
  };
}

// 429 响应体的标准格式
function rateLimitExceeded(res, retryAfterSeconds, detail) {
  res.status(429).set('Retry-After', retryAfterSeconds).json({
    type: 'https://tools.ietf.org/html/rfc6585#section-4',
    title: 'Too Many Requests',
    status: 429,
    detail: detail,
    retryAfter: retryAfterSeconds
  });
}

4.2 不同等级的限流策略

不是所有请求都应该一视同仁。生产级限流需要分级策略：

// 分级限流策略
const rateLimitPolicies = {
  // 免费用户：严格限流
  free: { requests: 100, windowMs: 60000 },      // 100 次/分钟
  // 付费用户：宽松限流
  pro: { requests: 1000, windowMs: 60000 },       // 1000 次/分钟
  // 企业用户：高配额
  enterprise: { requests: 10000, windowMs: 60000 }, // 10000 次/分钟
  // 内部服务：不限流
  internal: { requests: Infinity, windowMs: 60000 }
};

function getRateLimit(req) {
  const tier = req.user?.tier || 'free';
  const policy = rateLimitPolicies[tier];

  // 特殊端点的额外限制
  if (req.path.includes('/auth/login')) {
    return { requests: 5, windowMs: 300000 }; // 登录：5 次/5 分钟
  }
  if (req.path.includes('/upload')) {
    return { requests: 10, windowMs: 3600000 }; // 上传：10 次/小时
  }

  return policy;
}

4.3 生产避坑指南

在生产环境中实现限流，以下几个坑点值得特别注意：

❌ 坑 1：只按 IP 限流

很多开发者只用 IP 做限流 key。这在以下场景会出问题：

公司 NAT：几百人共享一个出口 IP，一个人超频全公司被封
CDN 代理：所有用户看到的是 CDN 的 IP
IPv6：每个设备有唯一 IP，IP 限流等于没限

✅ **正确做法：**IP 限流做粗粒度防护，API Key 或用户 ID 做细粒度限流。

❌ 坑 2：限流器和服务部署在同一台机器

当服务扩容到多个实例时，每个实例的限流器是独立的。10 个实例 × 每实例 100 次/分钟 = 实际 1000 次/分钟。

✅ **正确做法：**使用 Redis 等外部存储做集中式限流，或者接受「近似限流」的精度损失（每个实例按总配额/实例数限流）。

❌ 坑 3：忘记限流 WebSocket 和 SSE

HTTP API 通常有中间件限流，但 WebSocket 的消息和 SSE 的连接经常被遗忘。一个恶意客户端可以打开上千个 WebSocket 连接。

✅ **正确做法：**WebSocket 限流连接数（每个 IP 最多 5 个连接）+ 消息频率（每秒最多 10 条消息）。

// WebSocket 限流示例
const wsConnections = new Map(); // ip → count

wss.on('connection', (ws, req) => {
  const ip = req.socket.remoteAddress;
  const count = wsConnections.get(ip) || 0;

  if (count >= 5) {
    ws.close(4000, 'Too many connections');
    return;
  }

  wsConnections.set(ip, count + 1);
  ws.on('close', () => {
    wsConnections.set(ip, (wsConnections.get(ip) || 1) - 1);
  });
});

✅ 五、总结与工具推荐

限流不是一个「加上就好」的功能，它需要根据业务场景精心设计。以下是选型建议：

场景	推荐方案	推荐工具
小型项目 / 单体应用	内存级滑动窗口计数器	`bottleneck`、`express-rate-limit`
中型项目 / 多实例部署	Redis 集中式限流	`ioredis` + Lua 脚本、`rate-limiter-flexible`
大型项目 / 微服务架构	网关层限流 + 应用层补充	Nginx `limit_req`、Kong、Envoy
边缘计算 / 全球部署	CDN 限流 + 边缘 Worker	Cloudflare Rate Limiting、Durable Objects

⚡ 关键结论：限流的本质是保护服务可用性，而不是惩罚用户。好的限流策略应该让正常用户无感知，只对异常流量生效。永远在响应头中告知用户剩余配额和重置时间——这是 API 礼仪，也是 Stripe、GitHub 等顶级 API 的做法。

🔧 rate-limiter-flexible — Node.js 最强大的限流库，支持 Redis/Cluster/内存
🔧 bottleneck — 轻量级 Node.js 限流器，支持并发控制
🔧 Cloudflare Rate Limiting — 边缘层限流，配置简单
🔧 Nginx limit_req — 网关层限流，性能极高

如果你需要测试 API 的限流效果，可以用 jsjson.com 的 JSON 格式化工具来美化 429 响应体，快速定位限流配置是否生效。