Methodology. This tutorial synthesizes Upstash’s @upstash/ratelimit documentation as of May 2026 from upstash.com/docs/ratelimit and the canonical Vercel + Upstash integration patterns at upstash.com/docs/redis/sdks/ts/overview. For background on what rate limiting is and why it matters, see what is API rate limiting.

Rate limiting is the cheapest insurance you can buy for an API. It blunts brute-force attacks, caps the bill on AI endpoints when someone scripts your free tier, and gives you a graceful way to enforce plan tiers without rewriting business logic. The standard 2026 pattern for Next.js apps is Upstash Redis plus the @upstash/ratelimit SDK — serverless Redis over HTTP, no connection pool to manage, a free tier that covers most early-stage SaaS, and ergonomics that fit cleanly into both the App Router and Next.js middleware.

This guide walks the canonical setup: pick the right backend, wire Upstash, build the three limiter shapes you actually need (fixed-window, sliding-window, plan-tier), and apply them either globally via middleware or per-route. It ends with the response-header conventions, the edge cases that bite founders in production, and the auth-endpoint trap to avoid.

1 Why Upstash + serverless Redis is the canonical pattern

Three other approaches exist; each has a real reason it loses to Upstash for a Next.js SaaS on Vercel:

  • In-memory rate limiting (a Map per process) works on a single long-running server. On Vercel, every function invocation may run in a different isolate — the counter resets between requests, and the limit becomes a suggestion. Useless for serverless.
  • Cloudflare KV is eventually consistent. A user hitting the limit on one edge node may see fresh quota on another. Acceptable for caching, dangerous for limit enforcement.
  • Database-based limiting (a requests table in Postgres) is consistent and cheap to start, but every limit check is a write to your primary database. On a high-traffic endpoint, that’s real load on a resource you want to keep light.
  • Upstash Redis over REST is strongly consistent within a region, has single-digit-millisecond latency from Vercel functions co-located in the same region, and the SDK handles the atomic increment-and-check primitive Redis is famous for. The free tier covers 10K commands/day — more than enough for early traction.

The Upstash @upstash/ratelimit SDK ships with three algorithms (fixed window, sliding window, token bucket) implemented as Lua scripts that run atomically inside Redis. That atomic property is what makes the limit honest under concurrent requests — two requests that both ask “am I under the limit?” at the same millisecond can’t both get a yes.

2 Set up an Upstash Redis database

From the Upstash console, create a new Redis database. Pick the region closest to your Vercel deployment region (usually us-east-1 for the iad1 Vercel default). Choose the global option only if your app deploys to multiple regions and you can tolerate slightly higher write latency.

The free tier published on upstash.com/pricing as of May 2026 includes 10,000 commands per day, 256 MB of storage, and a max database size cap. Each rate-limit check is one or two Redis commands depending on the algorithm; 10K/day is roughly 5K-7K limit checks per day on the free tier, which covers most products until they have meaningful traffic.

Once the database is created, copy the REST URL and REST token from the dashboard and add them to your environment:

# .env.local
UPSTASH_REDIS_REST_URL=https://us1-example.upstash.io
UPSTASH_REDIS_REST_TOKEN=AX1lASQg_your_long_token_here

Add the same two variables to your Vercel project (Settings → Environment Variables) so the limiter works in production. The REST URL and token are scoped to a single database; use a separate database (and a separate token) for staging if you want isolation.

3 Install @upstash/ratelimit and @upstash/redis

npm install @upstash/ratelimit @upstash/redis

Both packages are pure-TypeScript and have no native dependencies, so they work in every Next.js runtime — the Node runtime, the Edge runtime, server actions, and middleware. That uniformity is part of why this pattern is canonical: the same limiter object can guard a route handler, a server action, and a middleware check without changing the import.

4 Build a fixed-window limiter

The simplest pattern, suitable for endpoints where slight burst-at-window-boundary behavior is fine. Fixed-window means “allow N requests per W seconds, counted from the top of the window.” The counter resets to zero when the window rolls over.

// src/lib/ratelimit.ts
import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';

const redis = Redis.fromEnv(); // reads UPSTASH_REDIS_REST_URL + TOKEN

export const fixedLimiter = new Ratelimit({
  redis,
  limiter: Ratelimit.fixedWindow(100, '1 h'),
  analytics: true,
  prefix: 'rl:fixed'
});

Use it in any route handler:

// src/app/api/posts/route.ts
import { NextResponse } from 'next/server';
import { fixedLimiter } from '@/lib/ratelimit';

export async function POST(req: Request) {
  const ip = req.headers.get('x-forwarded-for') ?? 'anonymous';
  const { success, limit, remaining, reset } = await fixedLimiter.limit(ip);

  if (!success) {
    return NextResponse.json(
      { error: 'rate_limited' },
      {
        status: 429,
        headers: {
          'Retry-After': Math.ceil((reset - Date.now()) / 1000).toString(),
          'X-RateLimit-Limit': limit.toString(),
          'X-RateLimit-Remaining': remaining.toString(),
          'X-RateLimit-Reset': new Date(reset).toISOString()
        }
      }
    );
  }

  // ... do the work
  return NextResponse.json({ ok: true });
}

The downside of fixed windows: a malicious client can fire 100 requests at 13:59:59 and another 100 at 14:00:00, getting 200 requests in two seconds. For most internal endpoints that’s a non-issue. For endpoints where smoothness matters (login, password reset, AI completions), use the sliding window in Step 5.

5 Build a sliding-window limiter

Sliding window is the recommended default for most APIs. It interpolates between two adjacent fixed windows, weighting requests in the previous window by how much of it has elapsed. The user experience is “the limit smooths over time” instead of “the limit resets at the top of the hour.”

// src/lib/ratelimit.ts (additions)
export const slidingLimiter = new Ratelimit({
  redis,
  limiter: Ratelimit.slidingWindow(100, '1 h'),
  analytics: true,
  prefix: 'rl:sliding'
});

Same call-site signature as the fixed-window limiter — await slidingLimiter.limit(key). The only difference is which constructor you pass to the limiter field. The atomic Lua script handles all the math inside Redis; you just see the boolean result.

The cost of sliding window is one extra Redis command per check (the SDK reads the current and previous window counters and computes the weighted total). That moves you from one to two commands per limit check — still well inside the free tier’s 10K/day for typical SaaS traffic.

6 Build a plan-tier limiter

Once you’re selling subscriptions, “100 requests per hour for everyone” stops making sense. Free users get a low cap (and feel the friction nudging them to upgrade); Pro users get a real budget; Business users get a budget large enough that they almost never see it. Implement this by reading the user’s plan from the JWT, picking the matching limiter, and keying by user ID instead of IP.

// src/lib/ratelimit.ts (additions)
export const limiterByPlan = {
  free: new Ratelimit({
    redis,
    limiter: Ratelimit.slidingWindow(100, '1 h'),
    prefix: 'rl:free'
  }),
  pro: new Ratelimit({
    redis,
    limiter: Ratelimit.slidingWindow(1_000, '1 h'),
    prefix: 'rl:pro'
  }),
  business: new Ratelimit({
    redis,
    limiter: Ratelimit.slidingWindow(10_000, '1 h'),
    prefix: 'rl:business'
  })
} as const;

export type Plan = keyof typeof limiterByPlan;
// src/app/api/ai/complete/route.ts
import { NextResponse } from 'next/server';
import { limiterByPlan, type Plan } from '@/lib/ratelimit';
import { getUserFromJwt } from '@/lib/auth';

export async function POST(req: Request) {
  const user = await getUserFromJwt(req);
  if (!user) {
    return NextResponse.json({ error: 'unauthorized' }, { status: 401 });
  }

  const plan: Plan = user.plan ?? 'free';
  const limiter = limiterByPlan[plan];
  const { success, limit, remaining, reset } = await limiter.limit(user.id);

  if (!success) {
    return NextResponse.json(
      { error: 'rate_limited', plan, limit },
      {
        status: 429,
        headers: {
          'Retry-After': Math.ceil((reset - Date.now()) / 1000).toString(),
          'X-RateLimit-Limit': limit.toString(),
          'X-RateLimit-Remaining': remaining.toString()
        }
      }
    );
  }

  // ... call OpenAI / Anthropic / your AI provider
  return NextResponse.json({ ok: true });
}

The keying choice matters. Key by user ID for authenticated routes — one user across multiple devices shares one budget, which is what your billing semantics want. Key by IP only for unauthenticated routes (the signup endpoint, the public marketing-page contact form). Key by API key for B2B API surfaces where customers each get a programmatic key. The combinator most apps end up with is “authenticated → user ID, anonymous → IP”:

function rateLimitKey(user: { id: string } | null, req: Request): string {
  if (user) return `user:${user.id}`;
  const ip = req.headers.get('x-forwarded-for')?.split(',')[0].trim()
    ?? req.headers.get('x-real-ip')
    ?? 'unknown';
  return `ip:${ip}`;
}

For where the JWT-based getUserFromJwt comes from, see how to add OAuth to your SaaS.

7 Apply via Next.js middleware or per-route

Two integration shapes, with a real trade-off between them.

Middleware: global, edge, fast

Next.js middleware runs before any route handler and can short-circuit a request with a 429 before the route is even invoked. This is the right place for IP-based abuse protection on every public endpoint — it’s applied uniformly, lives in one file, and never gets forgotten on a new route.

// src/middleware.ts
import { NextResponse, type NextRequest } from 'next/server';
import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';

const ipLimiter = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(60, '1 m'),
  prefix: 'rl:mw'
});

export async function middleware(req: NextRequest) {
  // Only rate-limit API routes from this middleware.
  if (!req.nextUrl.pathname.startsWith('/api/')) {
    return NextResponse.next();
  }

  const ip = req.headers.get('x-forwarded-for')?.split(',')[0].trim()
    ?? req.ip
    ?? 'unknown';

  const { success, limit, remaining, reset } = await ipLimiter.limit(ip);

  const res = success
    ? NextResponse.next()
    : NextResponse.json({ error: 'rate_limited' }, { status: 429 });

  res.headers.set('X-RateLimit-Limit', limit.toString());
  res.headers.set('X-RateLimit-Remaining', remaining.toString());
  res.headers.set('X-RateLimit-Reset', new Date(reset).toISOString());
  if (!success) {
    res.headers.set('Retry-After',
      Math.ceil((reset - Date.now()) / 1000).toString());
  }

  return res;
}

export const config = {
  matcher: ['/api/:path*']
};

Per-route: surgical, plan-aware

For plan-tier limits or endpoints that need different budgets per route (an AI completion endpoint at 100/hour Pro, a search endpoint at 1000/hour Pro), apply the limiter inside the route handler as in Step 6. The middleware can’t cleanly read your auth state on the way through — it can verify a JWT, but it can’t join against your users table to read a plan field. Per-route is where plan tier actually lives.

The pragmatic stack: middleware for coarse IP-based abuse protection (everyone gets 60/minute, no exceptions), per-route for plan-tier and per-feature limits.

8 Set the proper 429 response

The HTTP spec is unambiguous. When a request is rate-limited:

  • Status code 429. “Too Many Requests”. Not 403, not 503, not 502.
  • Retry-After header. Number of seconds the client should wait before retrying. Compute as ceil((reset - Date.now()) / 1000).
  • X-RateLimit-Limit header. The total quota for the current window. Conventional but not standardized.
  • X-RateLimit-Remaining header. How many requests are left in the current window. Send on every response, not just 429s — well-behaved clients use it to back off proactively.
  • X-RateLimit-Reset header. Either a Unix timestamp or an ISO 8601 string indicating when the quota resets.

A response body that names the cause helps debugging on the client side:

{
  "error": "rate_limited",
  "message": "You have exceeded the limit of 100 requests per hour.",
  "limit": 100,
  "remaining": 0,
  "resetAt": "2026-05-07T15:00:00.000Z"
}

Sending the headers on every response (success or 429) lets sophisticated clients pace themselves before they hit the wall — which improves the user experience of well-behaved customers and lets you keep your limits tight without alienating them.

Edge cases that bite in production

IP-based vs user-based vs API-key-based keys

Authenticated users should be keyed by user ID, not IP. Two reasons: a single user behind a corporate NAT shouldn’t share a budget with their colleagues, and a malicious user can’t bypass their limit by switching IPs. Anonymous traffic is keyed by IP. B2B API customers are keyed by their API key, which lets you sell rate-limit tiers as part of the plan upgrade ladder.

Never rate-limit auth endpoints too aggressively

The login, password-reset, and signup endpoints have a different threat model from the rest of your API. Aggressive limiting locks legitimate users out — a parent helping their teenager log in three times in a row, a customer mistyping their password, a marketing campaign that spikes signup traffic. Use a higher and more forgiving limit on auth endpoints, and consider per-account limiting (e.g. five failed password attempts per email address per fifteen minutes) instead of, or in addition to, per-IP. Lockout should be a last resort, not a default. The webhook security guide covers a related “how strict is too strict” question for inbound webhooks.

The cold-start cost

Upstash REST adds latency — published numbers and the network reality put it in the 30–50ms range from a Vercel function in the same region. That latency is paid on every request the limiter checks. For a fast endpoint that returns in 50ms, doubling the response time is real. Mitigations: keep your Upstash region close to your Vercel region, batch limit checks if a single request triggers multiple limited operations, and skip the limit check on hot internal traffic that doesn’t need it.

Rate-limiting incoming webhooks

Inbound webhooks (Stripe, GitHub, etc.) are not the place for aggressive rate limiting. The senders retry with their own backoff, and a 429 from your endpoint can cause a webhook to be marked as failed and disabled by the sender. Validate the signature, deduplicate by event ID, and trust the sender’s rate. If you need to throttle processing of webhooks (because each event triggers expensive work), enqueue and drain at your own pace — don’t reject the inbound delivery.

The downstream client-side problem

You’re also the client of someone else’s rate-limited API — OpenAI, Anthropic, Stripe, Resend. When you call them and they return 429, retry with exponential backoff and jitter. Jitter is the random component that prevents a thundering herd: if a thousand of your concurrent requests all retry at exactly 1.0s, you re-hit the limit. Add a random factor:

// src/lib/retry.ts
export async function withRetry<T>(
  fn: () => Promise<T>,
  opts: { maxAttempts?: number; baseMs?: number } = {}
): Promise<T> {
  const { maxAttempts = 5, baseMs = 250 } = opts;
  let attempt = 0;
  while (true) {
    try {
      return await fn();
    } catch (err: any) {
      attempt += 1;
      const status = err?.status ?? err?.response?.status;
      if (status !== 429 || attempt >= maxAttempts) throw err;
      const backoff = baseMs * 2 ** (attempt - 1);
      const jitter = Math.random() * backoff;
      await new Promise(r => setTimeout(r, backoff + jitter));
    }
  }
}

Honor the Retry-After header if the upstream sends one — it tells you exactly how long to wait, and using a smaller value risks compounding the limit. The best SaaS tools roundup has the broader vendor list where this matters.

Common mistakes

Keying by IP for authenticated endpoints

Two users behind the same NAT share a budget; a malicious user bypasses their limit by switching IPs. Key authenticated endpoints by user ID. Always.

Picking fixed-window for sensitive endpoints

Fixed windows allow burst-at-boundary patterns. For login, password reset, and AI completions, sliding window costs one extra Redis command and prevents the trivial bypass.

Forgetting Retry-After

Without the Retry-After header, well-behaved clients can’t back off correctly — they have to guess. Send it on every 429.

Locking out legitimate auth traffic

Aggressive limits on the login endpoint create lockout incidents that look like outages. Be loose on auth; layer per-account limiting and CAPTCHA above per-IP for stronger protection without the false positives.

Rejecting webhooks with 429

Senders may disable a webhook endpoint that returns 429 too often. Accept the delivery, enqueue, and process at your own pace.

Using the same Redis database for cache and rate limit

It works, but storage pressure on the cache (large objects, high churn) can evict rate-limit keys early. Either use prefixes carefully and watch eviction, or keep two Upstash databases — one for cache, one for rate limits. The free tiers stack.

Summary
Upstash → sliding window → user-keyed → plan-tiered → 429 with headers

Upstash plus @upstash/ratelimit is the canonical 2026 rate-limit stack for a Next.js SaaS on Vercel. Use sliding-window over fixed-window for anything sensitive, key authenticated endpoints by user ID, layer plan-tier limits on top of a coarse IP-based middleware floor, and send the proper 429 with Retry-After and X-RateLimit-* headers on every response. The free tier carries you well past launch; paid tiers scale linearly without operational work.

Related guides

Get one SaaS build breakdown every week

The stack, prompts, pricing, and mistakes to avoid — for solo founders building with AI.