# Connect an LLM to a user's inbox

Source: https://developer.nylas.com/docs/cookbook/ai/connect-llm-to-inbox/

You want an LLM to answer questions about a user's email, draft replies in their voice, or pull the one fact buried in a thread from last Tuesday. The model can't do any of that until it can read the inbox, and wiring a model to Gmail and Outlook directly means two OAuth apps, two message schemas, and a lot of MIME parsing before the first token reaches the prompt.

This recipe covers the plumbing between an inbox and a model: fetch messages over one API, shape them into context a model can use, and route the model's output back through the user's mailbox. It uses raw REST calls, so it works with any framework, not a specific agent runtime.

## How do you give an LLM access to a user's inbox?

You give an LLM inbox access in three steps: connect the account once with OAuth to get a grant, fetch messages through the [Messages API](/docs/reference/api/messages/) to use as model context, then send the model's output back through the same grant. The model never holds a password or token, so you can revoke its access by revoking the grant. A single grant covers all 6 providers: Gmail, Outlook, Yahoo, iCloud, IMAP, and Exchange.

The flow has a clear trust boundary. Your code reads mail and calls the model, the model returns text, and your code decides whether that text becomes a draft, a send, or a search result. The model itself gets no credentials and makes no network calls to the provider.

## Fetch recent messages as model context

The model needs the actual mail before it can reason about it, so start with a `GET /v3/grants/{grant_id}/messages` request. The endpoint returns 50 messages by default and up to 200 per page, each with a `snippet` field that holds the first 100 characters of the body. Snippets are usually enough context for classification and routing, and they keep your token bill a fraction of sending full bodies.

The function below pulls the 20 most recent unread messages and builds a compact context string. It uses `snippet` instead of `body` on purpose: 20 full HTML bodies can run past 30,000 tokens, while 20 snippets fit in under 2,000. Fetch full bodies only for the specific message the model decides to act on.

```python


NYLAS = "https://api.us.nylas.com"
HEADERS = {"Authorization": f"Bearer {os.environ['NYLAS_API_KEY']}"}

def fetch_context(grant_id, limit=20):
    r = requests.get(
        f"{NYLAS}/v3/grants/{grant_id}/messages",
        headers=HEADERS,
        params={"limit": limit, "unread": "true"},
    )
    r.raise_for_status()
    lines = []
    for m in r.json()["data"]:
        sender = m["from"][0]["email"] if m.get("from") else "unknown"
        lines.append(f"[{m['id']}] from {sender}: {m['subject']} - {m['snippet']}")
    return "\n".join(lines)
```

## Pass the inbox context to GPT or Claude

With the context built, the model call is an ordinary chat completion: a system prompt that states the task and the inbox context as the user turn. Ask the model to return structured output, such as JSON with a message ID and an action, so your code can act on the result deterministically instead of parsing prose. A run of 20 messages plus instructions stays under 2,000 tokens and costs about a tenth of a cent with GPT-4o-mini.

The snippet below asks the model which messages need a reply and returns a JSON array. Validate every ID the model returns against the IDs you actually sent, because models occasionally invent identifiers. Drop anything that doesn't match before you act on it.

```python
from openai import OpenAI
client = OpenAI()

def pick_replies(context, valid_ids):
    resp = client.chat.completions.create(
        model="gpt-4o-mini",
        temperature=0,
        response_format={"type": "json_object"},
        messages=[
            {"role": "system", "content":
                "Return JSON {\"reply\": [ids]} for messages that need a human reply."},
            {"role": "user", "content": context},
        ],
    )
    import json
    ids = json.loads(resp.choices[0].message.content).get("reply", [])
    # Models occasionally invent IDs; keep only ones from the inbox you fetched.
    return [i for i in ids if i in valid_ids]
```

## Let the model act through the mailbox

Once the model decides what to do, your code carries it out through the grant. For a reply, create a draft with `POST /v3/grants/{grant_id}/drafts` so a human reviews it before it leaves, or send directly with `POST /v3/grants/{grant_id}/messages/send` for fully automated flows. Drafts land in the user's real Drafts folder, so the review happens in the mail client the user already uses.

The function below drafts a reply the model wrote. Defaulting to a draft rather than a send is the single most important safety choice here, because a wrong auto-send goes to a real person and can't be recalled. Gmail consumer accounts also cap sends near 500 messages a day, so an auto-send loop can exhaust a user's quota fast. Switch to `messages/send` only after the draft path has proven itself on real mail.

```python
def draft_reply(grant_id, to_email, subject, body):
    r = requests.post(
        f"{NYLAS}/v3/grants/{grant_id}/drafts",
        headers=HEADERS,
        json={"to": [{"email": to_email}], "subject": f"Re: {subject}", "body": body},
    )
    r.raise_for_status()
    return r.json()["data"]["id"]
```

## Keep the model's context window under control

Inbox data grows without bound, but a context window doesn't, so budget tokens deliberately. A typical email body is 500 to 2,000 tokens once you strip HTML, and a 128K-token model still degrades in accuracy long before you fill it. Send snippets for the survey pass and full bodies only for the one or two messages the model commits to acting on.

Three tactics keep context lean. Filter at the API with `unread=true` or a date range so you never fetch mail the model doesn't need. Cap the survey at 20 to 50 messages per run. And for long threads, fetch the thread once and summarize it before it enters the prompt, covered in [Summarize email threads with AI](/docs/cookbook/ai/summarize-email-threads/).

## Things to know about LLMs and inbox data

Email is among the most sensitive data a user owns, so decide early what leaves your infrastructure. Sending mail to OpenAI or Anthropic is fine for many apps, but regulated workloads may need a local model. Swapping in a self-hosted model is a base-URL change, since most local runtimes expose an OpenAI-compatible endpoint, and the rest of this recipe stays identical.

Message delivery is at-least-once, so if you drive the model from a `message.created` webhook rather than polling, dedupe on the message `id` before you call the model. Acting twice on the same message means two drafts or two sends. The webhook setup is in [Receive a new-email webhook](/docs/cookbook/use-cases/build/new-email-webhook/).

Provider send limits still apply to model-driven mail. Gmail consumer accounts cap near 500 messages a day and Workspace near 2,000, so an over-eager agent can exhaust a user's quota fast. Default to drafts, rate-limit sends, and log every action the model takes for audit.

## What's next

- [Extract structured data from email with AI](/docs/cookbook/ai/extract-data-from-email/) to turn mail into typed fields
- [Summarize email threads with AI](/docs/cookbook/ai/summarize-email-threads/) for long conversations
- [Build an AI email triage agent](/docs/cookbook/agents/email-triage-agent/) for a complete classify-and-draft loop
- [Receive a new-email webhook](/docs/cookbook/use-cases/build/new-email-webhook/) to trigger the model on arrival
- [Messages API reference](/docs/reference/api/messages/) for all fetch and send parameters