Skip to content
Skip to main content

Gmail API pagination and sync explained

Last updated:

Paging through a Gmail mailbox sounds simple until you hit the details: messages.list returns IDs only, every page needs a token round-trip, and keeping a local copy current means tracking a historyId that expires. This page explains how the native Gmail API contract works, where it breaks, and how the same job looks with one cursor-based request.

How do Gmail API nextPageToken and maxResults work?

Section titled “How do Gmail API nextPageToken and maxResults work?”

The Gmail API maxResults parameter sets the page size for users.messages.list, up to a maximum of 500 IDs per page (the default is 100). When more matching messages exist, the response includes a nextPageToken string. You pass it back as pageToken on the next request and loop until the response omits the token.

The important rule: maxResults is the page size, not a total cap. A mailbox with 10,000 matching messages needs at least 20 sequential requests at maximum page size. According to the Gmail API messages.list reference, each call costs 5 quota units and returns message IDs only, never subjects or bodies.

The Python loop below collects every message ID in a mailbox with the official google-api-python-client library. It runs 20 times for a 10,000-message inbox, and you still need a separate messages.get call (20 quota units each) for every body you want to read.

from googleapiclient.discovery import build
service = build("gmail", "v1", credentials=creds)
all_messages = []
page_token = None
while True:
response = service.users().messages().list(
userId="me",
maxResults=500,
pageToken=page_token,
).execute()
all_messages.extend(response.get("messages", []))
page_token = response.get("nextPageToken")
if not page_token:
break
print(f"Fetched {len(all_messages)} message IDs")

That’s 18 lines to collect IDs. Fetching 10,000 full bodies afterward costs about 200,000 quota units in messages.get calls, which is why quota planning matters before a full sync (see Gmail API quotas).

How does Gmail incremental sync work with historyId?

Section titled “How does Gmail incremental sync work with historyId?”

Gmail incremental sync tracks mailbox changes through a monotonically increasing historyId. You store the ID from your last sync, then call users.history.list with startHistoryId to fetch only messages added, deleted, or relabeled since that point. Each call costs 2 quota units, 60% cheaper than a messages.list call at 5 units.

The Gmail API sync guide recommends this as the primary pattern for keeping a local copy current. The historyTypes parameter filters by change type: messageAdded, messageDeleted, labelAdded, and labelRemoved.

The function below pages through history records since a checkpoint and returns the new historyId to store for the next run. Like messages.list, the history endpoint paginates with nextPageToken, so even the delta path needs a token loop.

def get_changes_since(service, start_history_id):
"""Fetch all mailbox changes since the given historyId."""
changes = []
page_token = None
while True:
response = service.users().history().list(
userId="me",
startHistoryId=start_history_id,
historyTypes=["messageAdded", "messageDeleted"],
pageToken=page_token,
).execute()
changes.extend(response.get("history", []))
page_token = response.get("nextPageToken")
if not page_token:
break
new_history_id = response.get("historyId")
return changes, new_history_id

There’s a catch: history records expire after roughly 30 days (Google guarantees a minimum of one week). If your stored historyId is too old, history.list returns 404 Not Found, and your code must fall back to a full re-pagination. Every production sync client needs both code paths.

What goes wrong when you build Gmail sync yourself?

Section titled “What goes wrong when you build Gmail sync yourself?”

A production Gmail sync client has to handle the OAuth token lifecycle, the expired-history fallback, rate limiting, and partial failures in code you maintain. What starts as a 20-line pagination loop typically grows to 80-120 lines before logging, persistence, or multi-account support. The recurring failure points:

  • OAuth token management. Gmail access tokens expire every 3,600 seconds. The sync loop needs a refresh callback, expired-token detection, and a retry for the failed request.
  • Expired historyId fallback. When history.list returns 404, the client must discard the delta path and run a full pagination instead. Two code paths, both of which have to work.
  • Rate limiting. New Gmail API projects get 6,000 quota units per minute per user. A large sync needs client-side throttling and exponential backoff on 403 rateLimitExceeded and 429 Too Many Requests responses.
  • Partial page failures. A network error mid-pagination leaves you with half the results and a decision: retry from the start, or from the last good token? Either way you’re tracking state.
  • Setup overhead. Before any code runs, you need a Google Cloud project, an OAuth consent screen, a client ID and secret, and a redirect URI. That’s 15-20 minutes of console configuration, plus a verification review if your app requests restricted scopes.

How do you paginate Gmail messages with the Nylas API?

Section titled “How do you paginate Gmail messages with the Nylas API?”

The Nylas Email API replaces the two-step list-then-get pattern with one request. GET /v3/grants/{grant_id}/messages returns full message objects (subject, sender, body, folders) up to 200 per page, with a next_cursor field when more results exist. Token refresh, retries, and provider backoff happen server-side.

The Messages API returns paginated responses. When there are more results, the response includes a next_cursor value. Pass it back as page_token to get the next page:

Keep paginating until the response comes back without a next_cursor.

The same loop runs unchanged against Microsoft, Yahoo, iCloud, IMAP, and EWS accounts, because the cursor contract belongs to the unified API rather than to each provider. For the Gmail-specific listing walkthrough, including label filters and Gmail search operators, see How to list Google email messages.

Webhooks replace the historyId checkpoint entirely. Subscribing one HTTPS endpoint to the message.created trigger delivers a notification within seconds of a new message arriving, instead of a polling loop that fires 288 times per day per inbox at a 5-minute interval. There’s no Cloud Pub/Sub topic to create and no users.watch channel to renew every 7 days.

The real-time webhooks recipe covers the full setup: creating the webhook, responding to the challenge request, and verifying signatures. Pair it with webhook retry handling for delivery guarantees.

How do other email providers handle pagination?

Section titled “How do other email providers handle pagination?”

Each provider ships a different pagination contract, which is the main reason multi-provider sync code forks. Gmail uses an opaque nextPageToken with a 500-ID page cap. Microsoft Graph returns an @odata.nextLink URL the client follows verbatim, at up to 1,000 items per page. IMAP servers return every matching UID from UID SEARCH in one response. EWS pages with a numeric offset.

ProviderPagination methodCursor typeMax page size
Gmail APInextPageTokenOpaque string500
Microsoft Graph@odata.nextLinkFull URL1,000
IMAP (Yahoo, iCloud, hosted)UID SEARCH + fetch rangesSequence numbersNo page limit
EWS (legacy Exchange)IndexedPageItemViewNumeric offset1,000

The unified page_token / next_cursor contract abstracts all four, so one pagination loop covers every connected account. Provider-specific listing guides: Microsoft, Yahoo, iCloud, IMAP, and EWS.