Send email reliably at scale

Sending one message is a single API call. Sending hundreds of thousands across many connected accounts is a throughput and reliability problem: provider per-user caps, transient 429 and 5xx responses, and the risk of duplicate sends on retry. This recipe is the design layer. It covers how to spread load under provider limits, queue work, back off correctly, and stay idempotent. It does not repeat the Gmail quota numbers or the sending error reference; it links to both.

What limits how fast you can send email?

The hard ceiling is per-user, set by each mail provider, not by the API in front of it. Gmail caps consumer accounts near 500 messages per day and Google Workspace accounts near 2,000 per day, while Microsoft applies its own throttling that varies by tenant. The limit lives on the mailbox, not the application.

Because the limit is per mailbox, total throughput scales with the number of connected accounts, not with how fast you push one of them. That single fact shapes the whole design. A naive worker that drains a queue as fast as possible will burn through one user’s daily cap in minutes and then collect 429 responses for the rest of the day. The right model is one logical rate budget per grant, with the pipeline spreading each account’s sends across the day. When you need volume that no single mailbox can carry, the mailbox itself is the wrong unit, and you move to app-owned sending instead.

How do you queue email sends with a worker pool?

Put every send on a durable queue and let a fixed pool of workers drain it, so a traffic spike becomes queue depth instead of a flood of provider calls. Each job carries the grant_id, the message payload, and a stable idempotency key. Workers respect a per-grant rate budget, which keeps any one mailbox under its provider cap.

A queue decouples the moment you decide to send from the moment the provider accepts the message. Keep the per-grant concurrency low, often just 2 requests in flight per mailbox, since the daily cap, not parallelism, is the binding constraint. Size the global worker pool to your total accepted throughput, and let queue depth absorb bursts. Each worker calls the grant send endpoint, then either acknowledges the job or requeues it with delay. The call below is the unit of work every worker runs.

curl --request POST \
  --url "https://api.us.nylas.com/v3/grants/<NYLAS_GRANT_ID>/messages/send" \
  --header 'Authorization: Bearer <NYLAS_API_KEY>' \
  --header 'Content-Type: application/json' \
  --header 'Idempotency-Key: <STABLE_JOB_KEY>' \
  --data '{
    "to": [{ "name": "Kim Townsend", "email": "[email protected]" }],
    "subject": "Your March statement",
    "body": "Your statement is ready."
  }'

For the full send field reference and the SDK forms of this call, see Send email without SMTP.

How do you retry email sends with exponential backoff?

Retry only transient failures, and increase the wait between attempts so you do not amplify an overload. Retry on HTTP 429 (rate limited or quota exceeded) and on 5xx responses such as 503, which mean the provider or the platform is temporarily unavailable. Do not retry 400, 402, or 403, because those fail the same way every time.

Exponential backoff doubles the delay each attempt, for example 1s, 2s, 4s, 8s, capped near 60 seconds, with random jitter added so a fleet of workers does not retry in lockstep. Add a ceiling of 5 to 7 attempts; past that, route the job to a dead-letter queue for inspection rather than looping forever. A 429 specifically means a mailbox is near its cap, so back that grant off for longer, often minutes, not seconds. The error meanings are in the sending errors reference. The helper below computes the delay.

import random

RETRYABLE = {429, 500, 502, 503, 504}

def backoff_seconds(attempt: int) -> float:
    base = min(60, 2 ** attempt)        # 1, 2, 4, 8, ... capped at 60s
    return base + random.uniform(0, base * 0.5)  # jitter

def should_retry(status: int, attempt: int) -> bool:
    return status in RETRYABLE and attempt < 6

How do you avoid sending duplicate email?

Attach a stable idempotency key to every send job and check it before the provider call, so a retried job never delivers the message twice. Derive the key from something durable, like the order ID plus a template name, not from a random value generated at retry time. The key survives a crash and a retry.

A worker that crashes after the provider accepts a message but before it acknowledges the job will retry, and the key is what stops a second delivery. The pattern has two halves. First, record the key in a store the moment a send succeeds, with a retention window of at least 24 hours so late retries still match. Second, before each attempt, look the key up and skip the call if it already succeeded. The 429 and 5xx retries from the previous section make this mandatory, because every retry is a chance to double-send. Pass the key on the request as shown in the queue example above, and treat the store check as the first step inside each worker, ahead of the network call.

How do you monitor an email send pipeline?

Track four signals so a stuck pipeline surfaces before customers report it: queue depth, send success rate, retry rate, and dead-letter count. A retry rate climbing past roughly 5 percent of attempts usually means one or more grants are hitting provider caps, and a growing dead-letter queue means non-transient failures are accumulating unread.

Emit a metric on every outcome, tagged by status code and by grant_id, so you can tell a single throttled mailbox apart from a platform-wide problem. Alert on queue depth that keeps rising over a 15-minute window, since that means workers cannot keep pace with intake. Watch the 429 rate per grant to catch accounts approaching their daily ceiling, and watch 5xx rate across all grants to catch provider or platform incidents. Log the idempotency key with each send so you can trace a specific message end to end when you investigate a duplicate or a miss.

When should you send without a user mailbox?

When the mail belongs to your application rather than a person, the per-user cap stops being the right constraint. Password resets, receipts, and alerts have no mailbox to send from. Route them through transactional send from a domain, which sends from a verified domain with no grant and no OAuth.

Routing application mail through a grant wastes that user’s daily budget, which on a Gmail consumer account is only about 500 messages, and tangles application mail with personal mail. The split is ownership. Use a grant when a real user owns the address and replies should land in their inbox, and use domain send when your application owns the message. For high-volume application mail, domain send sidesteps the per-mailbox ceilings entirely, so the queue, backoff, and idempotency patterns here apply to a throughput limit you control rather than one each provider sets per user.

What’s next

Gmail API quotas and limits lists the exact per-method and per-user numbers behind Gmail caps.
Sending errors is the status-code reference for what to retry and what to fail.
Send email without SMTP is the single send call each worker runs.
Send transactional email from a domain handles app-owned mail above per-user caps.