Skip to content
Skip to main content

Retry and debug failed webhooks

One slow database write inside your handler is all it takes to lose events. If your endpoint doesn’t answer within 10 seconds, the request fails and the notification gets queued for retry. A deploy that returns 502 for 30 seconds, a burst of event.updated notifications that backs up your worker pool, an exception that throws before you send 200 OK — each one drops events your app needed.

Reliable delivery comes down to three habits: acknowledge fast, treat every delivery as a possible duplicate, and know how to inspect a webhook that’s gone quiet. This recipe covers all three.

Return 200 OK the instant a notification arrives, then do the real work in a background queue. The API enforces a 10-second timeout: if your endpoint spends those seconds parsing a 1 MB payload or writing to a database, the request fails and counts against your endpoint’s health. Acknowledge first, process second.

The pattern below reads the raw body, pushes it onto a queue, and answers within milliseconds. The handler never touches your database on the request path, so a slow query can’t blow the 10-second budget. A separate worker fetches objects and updates state.

For the architecture behind this, including load balancing and downtime handling, see Best practices for webhooks.

The Nylas APIs guarantee at-least-once delivery, so your handler will sometimes see the same event twice. When the first delivery doesn’t return 200 OK, the API retries up to two more times for three attempts total, backing off exponentially, with the final attempt landing 10 to 20 minutes after the first.

Make handlers idempotent by checking the notification’s id against a set of IDs you’ve already processed before you act. Record each ID in a store with a short expiry, skip anything you’ve already seen, and your logic stays correct whether an event arrives once or three times. The snippet below shows the check.

Retries fire only for status codes that signal a temporary problem: 408, 429, 502, 503, and 504, plus 507. The API treats every other code, including 4xx auth and validation errors, as permanent and never retries it, so a bug that returns 400 drops the event without warning.

When notifications stop arriving, work the chain from your endpoint back to the subscription. Most silent failures trace to one of four causes: the endpoint isn’t reachable over public HTTPS, the challenge verification never completed, the subscription lacks the right trigger_types, or the destination is in a failing or failed state. Check each before suspecting the API.

Walk this checklist:

  1. Reach the endpoint — confirm your webhook_url is public HTTPS and returns 200 to a GET. The API blocks Ngrok URLs, so use VS Code port forwarding or Hookdeck for local testing.
  2. Verification — the first activation sends a GET with a challenge query parameter you must echo verbatim in a 200 OK within 10 seconds. Miss it and the webhook never activates.
  3. Triggers — list your webhook and confirm trigger_types includes the event you expect. A handler waiting on message.created sees nothing if you only subscribed to event.created.
  4. Health — a destination marked failing (95% non-200 over 15 minutes) or failed (95% over 72 hours) stops normal delivery. Reactivate it from the Dashboard or the Webhooks API.

Once the subscription looks right, send a known-good payload instead of waiting for a real event. The Send Test Event endpoint posts a test notification and reports whether your endpoint answered 200 OK, and the Get Mock Notification Payload endpoint returns an example body for any trigger_type so you can unit-test your parser.

Delivery is best-effort with bounded retries, so design for gaps and repeats rather than assuming one in-order arrival per event. The API attempts each notification up to three times over a 10-to-20-minute window. If all three fail, it skips that notification type and keeps sending others. After 72 hours of 95% non-200 responses, it marks the endpoint failed.

A few specifics worth building around:

  • Timeouts count as failures. Your endpoint has 10 seconds per request. The verification GET shares the same 10-second budget, and the API verifies an endpoint only once — fail the first check and you must recreate or reactivate the webhook.
  • Ordering isn’t guaranteed. Per the Standard Webhooks specification, notifications can arrive out of order. Handle an .updated or .deleted notification that lands before the matching .created.
  • Payloads cap at 1 MB. Larger message.* notifications arrive truncated with a .truncated suffix and no body, so re-query the object with a Get Message request when you see one.
  • Verify before you trust. Every notification carries an X-Nylas-Signature header, a hex HMAC-SHA256 of the raw body. See Verify webhook signatures for the check, and secure a webhook for the secret it uses.
  • Allowlist the sender. Failure-state notifications come by email from [email protected]. Allowlist that address so the warning doesn’t land in Spam.

For the exact retry codes and failure thresholds, see Retry a webhook.