Backfill historical email

When a user first connects an account, you often want their full history, not just the recent window. This page covers a one-time backfill: page through the entire mailbox with the Messages API, reach mail older than the rolling IMAP cache with query_imap, and throttle the run so you stay under provider rate limits. It builds on the cursor mechanics in Gmail API pagination, so this page links those details rather than repeating them.

What is an email backfill?

An email backfill is a one-time import that pulls a user’s existing mail into your application right after they connect an account, rather than only capturing messages that arrive afterward. You typically run it once per grant, then switch to webhooks for ongoing updates.

A backfill is the historical counterpart to real-time sync. Real-time sync, driven by webhooks, tells you about messages that arrive after a user connects. The backfill handles everything that already existed: a mailbox can hold 50,000 messages or more accumulated over a decade, and none of it triggers a message.created event because it predates the connection. You run the backfill exactly once per grant, store a completion marker, and never repeat it. Treating backfill and live sync as two separate jobs keeps each one simple. The backfill is a bounded batch task you can pause and resume, while the webhook listener stays a lightweight always-on process.

How do you back up a full mailbox with the Nylas API?

Send GET /v3/grants/{grant_id}/messages with a limit of 200 and follow the next_cursor field through every page until it stops returning a value. Each page holds full message objects, so one pass captures the entire mailbox without a separate fetch step.

The backfill loop is the standard cursor pattern, just run to completion instead of for a single screen. The Nylas Email API returns up to 200 full message objects per request, including subject, sender, body, and folders, so you do not need the two-step list-then-get pattern that native APIs require. A 20,000-message mailbox finishes in 100 requests run in sequence at the maximum page size. The cursor mechanics, including how page_token and next_cursor relate, are covered in the pagination guide; the loop below is the part specific to a full backfill.

import requests

def backfill_mailbox(grant_id, api_key, store):
    url = f"https://api.us.nylas.com/v3/grants/{grant_id}/messages"
    headers = {"Authorization": f"Bearer {api_key}"}
    cursor = None
    total = 0

    while True:
        params = {"limit": 200}
        if cursor:
            params["page_token"] = cursor

        resp = requests.get(url, headers=headers, params=params)
        resp.raise_for_status()
        body = resp.json()

        store.save_batch(body["data"])
        total += len(body["data"])

        cursor = body.get("next_cursor")
        if not cursor:
            break

    return total

How do you import email older than the sync cache?

On IMAP providers, Nylas keeps a rolling cache of the most recent 90 days, so a plain backfill stops at that boundary. To reach older mail, add query_imap=true with an in folder, which queries the provider’s IMAP server directly and exposes the full mailbox.

This is the single most important detail for a complete historical import. Google and Microsoft expose the entire mailbox through their native APIs, so a standard cursor loop already reaches mail from years ago. IMAP-based providers (Yahoo, iCloud, and generic IMAP) work differently: Nylas syncs the last 90 days into a cache, and a normal list call only sees that window. Setting query_imap=true bypasses the cache and queries the live IMAP server, which returns older messages at the cost of higher latency. The parameter requires an in value, so you run one pass per folder. The 90-day cache behavior is documented in detail in the Yahoo messages guide.

curl --request GET \
  --url "https://api.us.nylas.com/v3/grants/<NYLAS_GRANT_ID>/messages?query_imap=true&in=INBOX&limit=10" \
  --header 'Accept: application/json' \
  --header 'Authorization: Bearer <NYLAS_API_KEY>'

When using query_imap, you must include the in parameter to specify which folder to search.

How do you avoid rate limits during a backfill?

Pace the backfill so you do not exhaust the provider’s per-user quota in one burst. Add a short delay between pages, cap concurrent grants, and respect any wait duration the API returns on a throttled response. A backfill is bursty by nature, so deliberate throttling keeps it from disrupting live traffic.

A full import sends hundreds of sequential requests against one account, which is exactly the traffic shape that trips rate limits. Gmail allows 250 quota units per second per user (see the rate limits guide), and IMAP providers throttle without publishing fixed numbers. Three habits keep a backfill safe: insert a delay of roughly 200 ms between pages so a single grant cannot saturate the quota, limit how many grants backfill at once with a worker pool, and honor the retry-after duration Nylas returns on a throttled response instead of retrying immediately. Run backfills as a background queue separate from interactive requests so a large import never slows the experience for active users.

The helper below wraps the page loop with a fixed delay and retries once when the API signals throttling.

import time
import requests

def fetch_page(url, headers, params, retry=True):
    resp = requests.get(url, headers=headers, params=params)

    if resp.status_code == 429 and retry:
        wait = int(resp.headers.get("Retry-After", "5"))
        time.sleep(wait)
        return fetch_page(url, headers, params, retry=False)

    resp.raise_for_status()
    time.sleep(0.2)  # 200ms between pages to stay under per-user quota
    return resp.json()

How do you resume a backfill that stops partway?

Persist the latest next_cursor to durable storage after every saved batch, alongside the messages from that page. If the job stops for any reason, restart it by passing the stored cursor as page_token, and it resumes from the last completed page instead of starting over from the beginning of the mailbox.

A backfill that takes minutes to hours will eventually meet a network blip, a deploy, or a process restart. Without saved state, the only recovery is to restart from page one, which wastes quota and risks duplicate writes. The fix is to checkpoint the cursor: write the next_cursor to durable storage in the same transaction that saves each batch, so the stored cursor and the saved messages never drift apart. On restart, read the cursor and resume. Because the cursor encodes position rather than a timestamp, a resumed run picks up exactly where the previous one stopped, re-fetching 0 messages it already saved. Pair this with an idempotent write keyed on message ID so a batch that was saved but not acknowledged does not create duplicates on the second pass.

What’s next

Gmail API pagination and sync explains the cursor contract this backfill loops over.
How to list Yahoo email messages details the 90-day cache and query_imap behavior on IMAP providers.
Messages API reference lists every parameter for the list endpoint.
Get real-time updates with webhooks covers the live sync you run after the backfill completes.
Build a unified inbox applies the same cursor pagination across several connected accounts.