# Handle 429 rate limit errors

Source: https://developer.nylas.com/docs/cookbook/use-cases/build/handle-rate-limit-errors/

Your batch job is humming along, then a wall of `429 Too Many Requests` responses stops it cold. You're hitting a rate limit, either one the API enforces or one the underlying provider enforces. A `429` isn't a failure you log and forget. It's a signal to slow down, wait the right amount of time, and retry.

This recipe is the handling playbook: how to detect a `429`, how to read the `Retry-After` header, and how to back off with exponential delay and jitter so your retries don't stampede. For the limit numbers themselves, see [API and provider rate limits](/docs/dev-guide/platform/rate-limits/).

## What does a 429 response mean?

A `429` response means you sent too many requests in a time window, and the server is refusing more until you slow down. The API returns a consistent JSON error object with a `message` field. The same status code covers several causes: an account throttled by its provider, a Gmail quota exhausted, a Microsoft mailbox over its concurrency limit, or too many calls to the API itself.

The error body tells you which cause you hit. The reference page documents seven distinct `429` variants, from `Account throttled` to `Application is over its MailboxConcurrency limit`, and a Microsoft send throttle can last around 20 minutes. You don't need to branch on every variant. Treat them all as "wait, then retry", and read the `Retry-After` header when it's present.

The cause matters for how long you wait, not for whether you retry. A Gmail `Resource exhausted` clears within seconds because Google's quota refills every second, so a short backoff recovers fast. An `Exchange account throttled` response is the opposite: the Exchange server pauses sync and returns `429` for any send for roughly 20 minutes, so retrying every few seconds wastes attempts. Read the body to set expectations, then let `Retry-After` and your backoff cap decide the actual delay.

```bash
# A throttled request returns 429 with a JSON error body.
curl -i --request GET \
  --url 'https://api.us.nylas.com/v3/grants/<GRANT_ID>/messages?limit=200' \
  --header 'Authorization: Bearer <NYLAS_API_KEY>'

# HTTP/2 429
# retry-after: 5
# {"request_id":"...","error":{"type":"provider_error","message":"Account throttled"}}
```

For the full list of causes and fixes, see the [client error reference for 400-499](/docs/api/errors/400-response/).

## How do I read the Retry-After header?

When a provider throttles a request, the API returns a `Retry-After` header with the number of seconds to wait before retrying. Microsoft and EWS responses include it; honor that value exactly instead of guessing a delay. When the header is absent, fall back to your own exponential backoff schedule starting around 1 second.

`Retry-After` is the provider telling you precisely how long it needs. Ignoring it and retrying early just earns another `429` and can extend the throttle. The Node and Python examples below check for the header first, parse it as an integer number of seconds, and only compute a backoff delay when it's missing.

```js [retryAfter-Node.js]
// Prefer the server's Retry-After value; fall back to computed backoff.
function getDelayMs(response, attempt) {
  const header = response.headers.get("retry-after");
  if (header) return Number.parseInt(header, 10) * 1000;
  return backoffWithJitter(attempt); // defined below
}
```

```python
# Prefer the server's Retry-After value; fall back to computed backoff.
def get_delay_seconds(response, attempt):
    header = response.headers.get("Retry-After")
    if header:
        return int(header)
    return backoff_with_jitter(attempt)  # defined below
```

Microsoft allows up to 4 concurrent requests per mailbox, so a `429` here often means too much parallelism rather than too many total calls. See [Microsoft rate limits](/docs/dev-guide/platform/rate-limits/#microsoft-rate-limits) for the concurrency and 10,000-requests-per-10-minute window.

EWS behaves the same way: an on-premises Exchange administrator sets the throttle, so the API can't know the ceiling in advance, but it forwards the server's `Retry-After` value when the server throttles a request. Two response headers let you act before a `429` ever fires. `Nylas-Provider-Request-Count` reports how many provider calls a single request consumed, and `Nylas-Gmail-Quota-Usage` reports how many Gmail quota units a Drafts, Messages, Threads, Folders, or Attachments request spent. Read both on successful responses, not just failures, so you can taper your request rate while you still have headroom rather than after the provider cuts you off.

## How do I retry with exponential backoff and jitter?

Exponential backoff doubles the wait after each failed attempt: roughly 1, 2, 4, 8 seconds. Jitter adds a random offset so many clients retrying at once don't all fire on the same tick and re-trigger the limit. Cap the delay (for example, at 32 seconds) and cap retries (5 attempts is a sensible default) so a persistent `429` doesn't loop forever.

Without jitter, a fleet of workers that all got throttled at the same moment retries in lockstep and stampedes the API again. The functions below add randomness of up to 1 second on top of the exponential base, honor `Retry-After` when present, and give up after 5 attempts. Both target the same `GET /v3/grants/{grant_id}/messages` endpoint that returns up to 200 messages per page.

```js [backoff-Node.js]
const BASE_MS = 1000;
const MAX_MS = 32000;
const MAX_RETRIES = 5;

function backoffWithJitter(attempt) {
  const exp = Math.min(MAX_MS, BASE_MS * 2 ** attempt);
  return exp + Math.floor(Math.random() * 1000); // up to 1s jitter
}

async function listMessagesWithRetry(grantId, apiKey) {
  for (let attempt = 0; attempt <= MAX_RETRIES; attempt++) {
    const res = await fetch(
      `https://api.us.nylas.com/v3/grants/${grantId}/messages?limit=200`,
      { headers: { Authorization: `Bearer ${apiKey}` } },
    );
    if (res.status !== 429) return res;

    if (attempt === MAX_RETRIES) {
      throw new Error("Rate limited after 5 retries");
    }
    const delay = getDelayMs(res, attempt);
    await new Promise((resolve) => setTimeout(resolve, delay));
  }
}
```

```python
BASE = 1.0
MAX = 32.0
MAX_RETRIES = 5

def backoff_with_jitter(attempt):
    exp = min(MAX, BASE * 2 ** attempt)
    return exp + random.random()  # up to 1s jitter

def list_messages_with_retry(grant_id, api_key):
    url = f"https://api.us.nylas.com/v3/grants/{grant_id}/messages?limit=200"
    headers = {"Authorization": f"Bearer {api_key}"}

    for attempt in range(MAX_RETRIES + 1):
        res = requests.get(url, headers=headers)
        if res.status_code != 429:
            return res

        if attempt == MAX_RETRIES:
            raise RuntimeError("Rate limited after 5 retries")
        time.sleep(get_delay_seconds(res, attempt))
```

The official Nylas SDKs raise exceptions on non-`200` responses, so wrap SDK calls in the same retry loop. For broader error strategy beyond rate limits, see [error monitoring and handling](/docs/dev-guide/best-practices/error-handling/).

## How do provider quotas differ from Nylas limits?

Two separate ceilings apply to every request. The API enforces its own limits, such as up to 200 requests per grant per second on the [Messages endpoint](/docs/reference/api/messages/). Underneath, each provider enforces its own quota, and a single API call can fan out to several provider calls. You can hit a provider limit while staying well under the API rate limit.

Google and Microsoft count usage differently, which changes how you pace requests:

| Provider  | Quota model                | Key limit                                          | Header to watch                                   |
| --------- | -------------------------- | -------------------------------------------------- | ------------------------------------------------- |
| Google    | Per-user and per-project   | 6,000 quota units/min per user; 1.2M/min per project | `Nylas-Gmail-Quota-Usage`                        |
| Microsoft | Per-mailbox                | 10,000 requests/10 min; max 4 concurrent           | `Nylas-Provider-Request-Count`                     |

Because Microsoft counts per mailbox and Google counts per user and per project, a noisy single account throttles differently on each. Watch the `Nylas-Provider-Request-Count` header to see how many provider calls a request consumed, and space requests out accordingly.

Gmail's quota math is the part that surprises people. The 6,000 quota units per minute per user isn't 6,000 requests: each method spends a different number of units, so a fast loop of cheap calls and a slow loop of expensive ones can hit the same ceiling. A `messages.list` page costs 5 units, a `messages.get` body read costs 20, and a `messages.send` costs 100, so reading bodies costs 4x listing them and sending costs 20x. A single unified API call that lists messages can fan out internally to one `messages.list` plus one `messages.get` per message, which is why a `limit=200` list can burn provider quota far faster than 200 looks. For the full per-method cost table and a sync-budget worksheet, see [Gmail API quotas and limits](/docs/cookbook/email/gmail-api-quotas/).

## How do I queue requests to stay under the limit?

Queue work through a rate limiter that caps outbound requests below the ceiling instead of firing them as fast as your code generates them. A token-bucket or fixed-window limiter set to, for example, 150 requests per second per grant keeps you under the 200-per-second Messages limit with headroom for retries. Pair the queue with the backoff loop so throttled jobs return to the queue cleanly.

Queueing turns reactive retrying into proactive pacing. For Microsoft, keep in-flight requests at or below 4 per mailbox to respect the concurrency cap. For Google, throttle per user since the 6,000-quota-units-per-minute limit is per-user, not per-application. A single shared queue per grant gives you one place to tune both the rate and the concurrency.

```js [queue-Node.js]
// Minimal per-grant token bucket: refills 150 tokens each second.
function createLimiter(ratePerSecond = 150) {
  let tokens = ratePerSecond;
  setInterval(() => (tokens = ratePerSecond), 1000);

  return async function acquire() {
    while (tokens <= 0) {
      await new Promise((resolve) => setTimeout(resolve, 50));
    }
    tokens -= 1;
  };
}
```

Combine the limiter with `listMessagesWithRetry` above: call `acquire()` before each request so you pace proactively and still recover from any `429` that slips through.

## Should I throttle proactively or back off after a 429?

Use both, because they solve different problems. Proactive throttling caps your outbound rate below the ceiling so most requests never hit a limit, which keeps latency steady. Reactive backoff is the recovery net for the requests that slip through anyway: a quota you share with other traffic, a provider-side spike, or a burst your limiter didn't predict. A limiter set to 150 requests per second under the 200 requests per second Messages limit handles the first; the backoff loop, capped at 5 attempts, handles the second.

Lean on proactive pacing for steady, predictable workloads where you control the request rate, such as a nightly sync or a polling job. Lean on reactive backoff for traffic you can't shape, such as user-triggered actions that arrive in unpredictable bursts. The cost of getting the balance wrong is asymmetric. Throttle too aggressively and a backfill that should take an hour takes three; throttle too loosely and you trade a few saved seconds for repeated `429` responses that each cost a full `Retry-After` wait, often longer than the request you were trying to save. When in doubt, set the limiter conservatively and let backoff absorb the rare miss.

## How do I backfill 50 accounts without tripping limits?

Backfill per grant, not per fleet. Each Messages limit is scoped to one grant at up to 200 requests per second, so 50 grants give you 50 independent budgets rather than one shared pool. Run a bounded number of grants in parallel, each with its own per-grant limiter set near 150 requests per second, and the fleet stays under every ceiling at once. The constraint that bites first is almost always a provider quota, not the API limit.

Concretely: spin up a worker pool, assign each grant its own token bucket and backoff loop, and cap concurrency at the slowest provider's tolerance. For Microsoft mailboxes, hold in-flight requests at or below 4 per mailbox, since the MailboxConcurrency limit rejects a fifth simultaneous request outright. For Gmail accounts, pace per user under 6,000 quota units per minute, and remember a `messages.get`-heavy backfill spends units 4x faster than a list-only pass. Read `Nylas-Provider-Request-Count` on each response to learn the real fan-out per request, then tune each worker's rate down until the count stops climbing. A 50-account backfill that respects per-grant budgets and honors `Retry-After` finishes without a single account entering a 20-minute Exchange throttle.

## What's next

- [API and provider rate limits](/docs/dev-guide/platform/rate-limits/) for the exact per-grant and per-provider numbers
- [Error monitoring and handling](/docs/dev-guide/best-practices/error-handling/) for retry strategy across all error types
- [Gmail API quotas and limits](/docs/cookbook/email/gmail-api-quotas/) for Gmail's per-method quota unit costs
- [Client error responses 400-499](/docs/api/errors/400-response/) for every `429` variant and its fix