Skip to content
Skip to main content

Connect an LLM to a user's inbox

Last updated:

You want an LLM to answer questions about a user’s email, draft replies in their voice, or pull the one fact buried in a thread from last Tuesday. The model can’t do any of that until it can read the inbox, and wiring a model to Gmail and Outlook directly means two OAuth apps, two message schemas, and a lot of MIME parsing before the first token reaches the prompt.

This recipe covers the plumbing between an inbox and a model: fetch messages over one API, shape them into context a model can use, and route the model’s output back through the user’s mailbox. It uses raw REST calls, so it works with any framework, not a specific agent runtime.

How do you give an LLM access to a user’s inbox?

Section titled “How do you give an LLM access to a user’s inbox?”

You give an LLM inbox access in three steps: connect the account once with OAuth to get a grant, fetch messages through the Messages API to use as model context, then send the model’s output back through the same grant. The model never holds a password or token, so you can revoke its access by revoking the grant. A single grant covers all 6 providers: Gmail, Outlook, Yahoo, iCloud, IMAP, and Exchange.

The flow has a clear trust boundary. Your code reads mail and calls the model, the model returns text, and your code decides whether that text becomes a draft, a send, or a search result. The model itself gets no credentials and makes no network calls to the provider.

The model needs the actual mail before it can reason about it, so start with a GET /v3/grants/{grant_id}/messages request. The endpoint returns 50 messages by default and up to 200 per page, each with a snippet field that holds the first 100 characters of the body. Snippets are usually enough context for classification and routing, and they keep your token bill a fraction of sending full bodies.

The function below pulls the 20 most recent unread messages and builds a compact context string. It uses snippet instead of body on purpose: 20 full HTML bodies can run past 30,000 tokens, while 20 snippets fit in under 2,000. Fetch full bodies only for the specific message the model decides to act on.

import os, requests
NYLAS = "https://api.us.nylas.com"
HEADERS = {"Authorization": f"Bearer {os.environ['NYLAS_API_KEY']}"}
def fetch_context(grant_id, limit=20):
r = requests.get(
f"{NYLAS}/v3/grants/{grant_id}/messages",
headers=HEADERS,
params={"limit": limit, "unread": "true"},
)
r.raise_for_status()
lines = []
for m in r.json()["data"]:
sender = m["from"][0]["email"] if m.get("from") else "unknown"
lines.append(f"[{m['id']}] from {sender}: {m['subject']} - {m['snippet']}")
return "\n".join(lines)

With the context built, the model call is an ordinary chat completion: a system prompt that states the task and the inbox context as the user turn. Ask the model to return structured output, such as JSON with a message ID and an action, so your code can act on the result deterministically instead of parsing prose. A run of 20 messages plus instructions stays under 2,000 tokens and costs about a tenth of a cent with GPT-4o-mini.

The snippet below asks the model which messages need a reply and returns a JSON array. Validate every ID the model returns against the IDs you actually sent, because models occasionally invent identifiers. Drop anything that doesn’t match before you act on it.

from openai import OpenAI
client = OpenAI()
def pick_replies(context, valid_ids):
resp = client.chat.completions.create(
model="gpt-4o-mini",
temperature=0,
response_format={"type": "json_object"},
messages=[
{"role": "system", "content":
"Return JSON {\"reply\": [ids]} for messages that need a human reply."},
{"role": "user", "content": context},
],
)
import json
ids = json.loads(resp.choices[0].message.content).get("reply", [])
# Models occasionally invent IDs; keep only ones from the inbox you fetched.
return [i for i in ids if i in valid_ids]

Once the model decides what to do, your code carries it out through the grant. For a reply, create a draft with POST /v3/grants/{grant_id}/drafts so a human reviews it before it leaves, or send directly with POST /v3/grants/{grant_id}/messages/send for fully automated flows. Drafts land in the user’s real Drafts folder, so the review happens in the mail client the user already uses.

The function below drafts a reply the model wrote. Defaulting to a draft rather than a send is the single most important safety choice here, because a wrong auto-send goes to a real person and can’t be recalled. Gmail consumer accounts also cap sends near 500 messages a day, so an auto-send loop can exhaust a user’s quota fast. Switch to messages/send only after the draft path has proven itself on real mail.

def draft_reply(grant_id, to_email, subject, body):
r = requests.post(
f"{NYLAS}/v3/grants/{grant_id}/drafts",
headers=HEADERS,
json={"to": [{"email": to_email}], "subject": f"Re: {subject}", "body": body},
)
r.raise_for_status()
return r.json()["data"]["id"]

Keep the model’s context window under control

Section titled “Keep the model’s context window under control”

Inbox data grows without bound, but a context window doesn’t, so budget tokens deliberately. A typical email body is 500 to 2,000 tokens once you strip HTML, and a 128K-token model still degrades in accuracy long before you fill it. Send snippets for the survey pass and full bodies only for the one or two messages the model commits to acting on.

Three tactics keep context lean. Filter at the API with unread=true or a date range so you never fetch mail the model doesn’t need. Cap the survey at 20 to 50 messages per run. And for long threads, fetch the thread once and summarize it before it enters the prompt, covered in Summarize email threads with AI.

Email is among the most sensitive data a user owns, so decide early what leaves your infrastructure. Sending mail to OpenAI or Anthropic is fine for many apps, but regulated workloads may need a local model. Swapping in a self-hosted model is a base-URL change, since most local runtimes expose an OpenAI-compatible endpoint, and the rest of this recipe stays identical.

Message delivery is at-least-once, so if you drive the model from a message.created webhook rather than polling, dedupe on the message id before you call the model. Acting twice on the same message means two drafts or two sends. The webhook setup is in Receive a new-email webhook.

Provider send limits still apply to model-driven mail. Gmail consumer accounts cap near 500 messages a day and Workspace near 2,000, so an over-eager agent can exhaust a user’s quota fast. Default to drafts, rate-limit sends, and log every action the model takes for audit.