Build an autonomous email agent

The triage and support recipes both stop at a draft because a human still reads every reply before it leaves. This recipe removes that human. An autonomous agent reads incoming mail, decides what to say, and sends it through the user’s mailbox without anyone checking first. That shift is small in code and large in consequence, so the bulk of this page is governance rather than model logic. The classification and reply-writing belong to the email triage agent; here you wrap that loop in five guardrails that decide whether a generated reply ever reaches a real person.

When should an agent send email without human review?

Autonomous sending fits narrow, low-stakes flows: order confirmations, scheduling acknowledgements, routine status replies. It does not fit anything that commits your company to money, contracts, or legal positions. Keep a human in the loop for those, and reserve full automation for high-volume, low-variance mail where a wrong send is recoverable.

The decision turns on blast radius. A misfired draft costs one click to fix, but an autonomous send lands in a real inbox and cannot be recalled. Before you flip the switch, measure your false-positive rate on a draft-only run for at least 14 days. If the agent would have sent something wrong even once in that window, you are not ready to remove the human. The five guardrails below assume you have cleared that bar and now need the send path itself to fail safe.

Cap how many messages the agent sends

A runaway loop is the failure that hurts most, so put a hard ceiling on sends before anything else. Track a rolling count per grant and refuse to send once the agent crosses your limit for the window. A reasonable starting point is 50 sends per hour and 200 per day per mailbox, well under provider ceilings so a bug trips your cap long before it trips theirs. Gmail consumer accounts cap near 500 messages a day and Workspace near 2,000, and hitting those gets the mailbox throttled.

The counter below lives in your own store, separate from the model, so it holds even when the model misbehaves. Reset the window on a fixed clock, not a sliding one, to keep the accounting simple and auditable.

import time

class RateCap:
    def __init__(self, per_hour=50, per_day=200):
        self.per_hour, self.per_day = per_hour, per_day
        self.sends = []  # list of unix timestamps

    def allow(self):
        now = time.time()
        self.sends = [t for t in self.sends if now - t < 86400]
        last_hour = sum(1 for t in self.sends if now - t < 3600)
        return last_hour < self.per_hour and len(self.sends) < self.per_day

    def record(self):
        now = time.time()
        self.sends = [t for t in self.sends if now - t < 86400]  # trim so the list stays bounded
        self.sends.append(now)

Restrict who the agent can email

A send cap limits volume but not direction, so pair it with an allowlist that names every recipient the agent may contact. Default deny: if an address is not on the list, the agent does not send, full stop. For broader flows, allow a domain pattern such as @yourcompany.com and keep a denylist for the executive and press addresses that must never receive automated mail; the example below blocks 2 such addresses. One blocked send is cheaper than one wrong send to the wrong person.

The check runs on every outbound message, before the rate cap, so a disallowed recipient never even counts against your quota. Check the denylist first so it always wins, then fall back to the allowlist, since the denylist is the stricter rule and must win any conflict.

ALLOW_DOMAINS = {"yourcompany.com"}
ALLOW_EXACT = {"[email protected]"}
DENY_EXACT = {"[email protected]", "[email protected]"}

def recipient_allowed(email):
    addr = email.strip().lower()
    if addr in DENY_EXACT:
        return False
    if addr in ALLOW_EXACT:
        return True
    domain = addr.split("@")[-1]
    return domain in ALLOW_DOMAINS

Prevent duplicate sends with an idempotency key

At-least-once delivery means a message.created webhook can fire twice, turning 1 message into 2 messages of work, and a crash mid-loop can replay the same work, so the agent must be safe to run twice. The send endpoint accepts an Idempotency-Key header of up to 256 characters. Derive the key from the inbound message id plus the action, hashed to a stable 64-character hex value, so a retry with the same inputs produces the same key and the API returns the original result instead of sending again.

A reused key with a different payload returns a 409, which is your signal that something changed underneath a retry. Treat that as a bug to investigate, not an error to swallow. The helper below sends a reply and rides the same key on every retry of that exact reply.

import hashlib, os, requests

NYLAS = "https://api.us.nylas.com"
HEADERS = {"Authorization": f"Bearer {os.environ['NYLAS_API_KEY']}"}

def idem_key(message_id, action="reply"):
    return hashlib.sha256(f"{message_id}:{action}".encode()).hexdigest()

def send_reply(grant_id, message_id, to_email, subject, body):
    r = requests.post(
        f"{NYLAS}/v3/grants/{grant_id}/messages/send",
        headers={**HEADERS, "Idempotency-Key": idem_key(message_id)},
        json={
            "to": [{"email": to_email}],
            "subject": f"Re: {subject}",
            "body": body,
            "reply_to_message_id": message_id,  # thread under the inbound message
        },
    )
    r.raise_for_status()
    return r.json()["data"]["id"]

Log every action the agent takes

An autonomous agent acts while nobody watches, so the audit log is how you reconstruct what happened after the fact. Write one record per decision: the inbound message id, the recipient, the model’s chosen action, whether each guardrail passed, the idempotency key, and the outcome. Append-only, never edited. When a customer asks why they got a reply at 3 a.m., a complete log answers in under 1 minute instead of a forensic dig through provider logs.

Log the decision even when the agent declines to send, because a blocked send is often the more interesting record. The function below writes a structured line your log pipeline can index and query later.

import json, time

def audit(event):
    record = {"ts": time.time(), **event}
    with open("/var/log/email-agent/audit.jsonl", "a") as f:
        f.write(json.dumps(record) + "\n")

# audit({"message_id": mid, "to": addr, "action": "reply",
#        "allowed": True, "rate_ok": True, "key": key, "result": "sent"})

Wire in a kill switch

Every autonomous system needs an off switch that anyone can hit without a deploy, so gate the send path behind a single flag checked at the top of each cycle. Flip it and the agent keeps reading mail but stops sending, which lets you stop the bleeding in under 10 seconds during an incident and investigate without losing inbound context. Store the flag somewhere external to the process: a row in your database, a feature-flag service, or a file the operator can touch.

Check the switch first, then the allowlist, then the rate cap, then send. Ordering matters, because the kill switch must override everything else. The dispatcher below shows the full gate sequence with the audit log capturing each branch.

def dispatch(grant_id, msg, reply_body):
    addr = msg["from"][0]["email"]
    key = idem_key(msg["id"])
    base = {"message_id": msg["id"], "to": addr, "key": key}

    if kill_switch_on():
        audit({**base, "result": "blocked_kill_switch"}); return
    if not recipient_allowed(addr):
        audit({**base, "result": "blocked_recipient"}); return
    if not rate_cap.allow():
        audit({**base, "result": "blocked_rate_cap"}); return

    send_reply(grant_id, msg["id"], addr, msg["subject"], reply_body)
    rate_cap.record()
    audit({**base, "result": "sent"})

Things to know about autonomous sending

Run the agent draft-only first, then dry-run the send path with every guardrail active but the final API call stubbed out. Compare what it would have sent against what a human would have approved across at least 100 messages of real inbound mail. Only after that comparison holds should you let the agent send for real, and even then start with the rate cap set to 5 per hour for the first day.

Guardrails are independent layers, not a single check. The rate cap stops volume, the allowlist stops direction, idempotency stops duplicates, the log explains the past, and the kill switch stops the present. Removing any one of the five leaves a gap the other four cannot cover. Keep the model out of all of them, because a model that can talk itself past its own safety rails is not a safety rail.

What’s next

Build an AI email triage agent for the classify-and-draft loop this recipe builds on
Connect an LLM to a user’s inbox for the read-and-act plumbing
Send email at scale for provider rate limits and batching
Messages API reference for send parameters and the Idempotency-Key header