Build an AI email triage agent

A triage agent does what you’d ask an over-caffeinated EA to do: open each unread message, decide whether it needs you in the next hour, the next day, the next week, or never, draft replies for the urgent ones, and bulk-archive the rest. The pattern below runs as a cron job every fifteen minutes and costs roughly $0.002 per 100 emails using GPT-4o-mini for classification.

Most of the work happens in two prompts and three CLI calls.

The four buckets

Bucket	Meaning	Action
`URGENT`	Production incident, executive ask	Draft a reply within the hour
`ACTION`	Code review, meeting follow-up	Draft a reply same-day
`FYI`	Status update, FYI thread	Leave it alone
`NOISE`	Newsletter, marketing, automated alert	Archive

Four is the right number. Three loses fidelity (everything ends up in “important”). Five and the model starts confusing categories.

The classification prompt

Run it with temperature=0 and max_tokens=10 — you want deterministic output and a single token of output, not a paragraph. The model gets sender + subject + 200-char snippet, not the full body. That’s plenty for >90% accuracy and it keeps the per-email cost trivial.

You triage email into one of four categories:

URGENT  — production incidents, executive requests; reply within 1 hour
ACTION  — code reviews, meeting follow-ups; reply same day
FYI     — informational, no response needed
NOISE   — newsletters, marketing, automated notifications

From:    {sender}
Subject: {subject}
Snippet: {snippet}

Return ONLY the category name. Nothing else.

Always validate the output against the four valid strings — LLMs occasionally invent a category. Fall back to FYI on anything unrecognized.

The loop

import json, subprocess
from openai import OpenAI

VALID = {"URGENT", "ACTION", "FYI", "NOISE"}
client = OpenAI()

def fetch_unread(limit=50):
    out = subprocess.run(
        ["nylas", "email", "list", "--unread", "--limit", str(limit), "--json"],
        capture_output=True, text=True, check=True,
    )
    return json.loads(out.stdout)

def classify(msg):
    resp = client.chat.completions.create(
        model="gpt-4o-mini",
        temperature=0,
        max_tokens=10,
        messages=[{"role": "user", "content": CLASSIFY_PROMPT.format(
            sender=msg["from"][0]["email"],
            subject=msg["subject"],
            snippet=msg["snippet"][:200],
        )}],
    )
    cat = resp.choices[0].message.content.strip()
    return cat if cat in VALID else "FYI"

def draft_reply(msg):
    resp = client.chat.completions.create(
        model="gpt-4o",
        temperature=0.7,
        messages=[{"role": "user", "content": DRAFT_PROMPT.format(
            sender=msg["from"][0]["email"],
            subject=msg["subject"],
            body=msg["body"],
        )}],
    )
    body = resp.choices[0].message.content
    subprocess.run(
        ["nylas", "email", "draft",
         "--to", msg["from"][0]["email"],
         "--subject", "Re: " + msg["subject"],
         "--body", body, "--json"],
        check=True,
    )

def archive(msg):
    subprocess.run(["nylas", "email", "archive", msg["id"]], check=True)

for msg in fetch_unread():
    cat = classify(msg)
    if cat in ("URGENT", "ACTION"):
        draft_reply(msg)
    elif cat == "NOISE":
        archive(msg)
    # FYI: do nothing

Drafts land in your drafts folder — the agent never sends. You review, edit, and hit send (or not).

The drafting prompt

Write a short, professional reply to this email. Three sentences max. Be direct.

From:    {sender}
Subject: {subject}
Body:    {body}

Reply:

Higher temperature here (0.7) lets the model produce natural prose. The “three sentences max” is load-bearing — without it, you’ll get drafts that read like a politely overcompensating intern.

Cron it

*/15 * * * * /usr/bin/python3 /opt/triage/triage.py >> /var/log/triage.log 2>&1

The whole thing is idempotent — only --unread messages are pulled, and once you read a draft (or the original) it falls out of subsequent runs.

Cost math

GPT-4o-mini classification: ~$0.15 per 1M input tokens. A 200-char snippet plus the prompt is ~~150 tokens. 100 emails ≈ 15K tokens ≈ $0.002. Drafting uses GPT-4o (~~$2.50 / 1M input) but only on the URGENT + ACTION subset — typically under 20% of the inbox. A heavy day at 200 unread emails costs roughly a nickel.

Privacy mode

For mail you don’t want hitting OpenAI / Anthropic, swap to a local Ollama:

client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
# model="llama3.1" or similar

Llama 3.1 classifies almost as well as GPT-4o-mini for this task. Drafting quality drops noticeably with smaller models — keep that on a hosted LLM unless you’re running a 70B+ parameter local model.

Things to know

Only --unread. The agent never re-classifies what it already drafted. If you re-read a draft, the original message stays read; the next cron skips it.
No auto-send. Always land in drafts. The cost of a wrong send (to the wrong person, at the wrong tone) is way higher than the friction of one extra click.
Tune per inbox. Eng inboxes hit URGENT differently than sales inboxes. Customize the four categories and the prompt to your context.

Next steps

Email support agent — adds knowledge-base lookup
Reach inbox zero — interactive variant
Build an LLM agent with email & calendar tools