A triage agent does what you’d ask an over-caffeinated EA to do: open each unread message, decide whether it needs you in the next hour, the next day, the next week, or never, draft replies for the urgent ones, and bulk-archive the rest. The pattern below runs as a cron job every fifteen minutes and costs roughly $0.002 per 100 emails using GPT-4o-mini for classification.
Most of the work happens in two prompts and three CLI calls.
The four buckets
Section titled “The four buckets”| Bucket | Meaning | Action |
|---|---|---|
URGENT | Production incident, executive ask | Draft a reply within the hour |
ACTION | Code review, meeting follow-up | Draft a reply same-day |
FYI | Status update, FYI thread | Leave it alone |
NOISE | Newsletter, marketing, automated alert | Archive |
Four is the right number. Three loses fidelity (everything ends up in “important”). Five and the model starts confusing categories.
The classification prompt
Section titled “The classification prompt”Run it with temperature=0 and max_tokens=10 — you want deterministic output and a single token of output, not a paragraph. The model gets sender + subject + 200-char snippet, not the full body. That’s plenty for >90% accuracy and it keeps the per-email cost trivial.
You triage email into one of four categories:
URGENT — production incidents, executive requests; reply within 1 hourACTION — code reviews, meeting follow-ups; reply same dayFYI — informational, no response neededNOISE — newsletters, marketing, automated notifications
From: {sender}Subject: {subject}Snippet: {snippet}
Return ONLY the category name. Nothing else.Always validate the output against the four valid strings — LLMs occasionally invent a category. Fall back to FYI on anything unrecognized.
The loop
Section titled “The loop”import json, subprocessfrom openai import OpenAI
VALID = {"URGENT", "ACTION", "FYI", "NOISE"}client = OpenAI()
def fetch_unread(limit=50): out = subprocess.run( ["nylas", "email", "list", "--unread", "--limit", str(limit), "--json"], capture_output=True, text=True, check=True, ) return json.loads(out.stdout)
def classify(msg): resp = client.chat.completions.create( model="gpt-4o-mini", temperature=0, max_tokens=10, messages=[{"role": "user", "content": CLASSIFY_PROMPT.format( sender=msg["from"][0]["email"], subject=msg["subject"], snippet=msg["snippet"][:200], )}], ) cat = resp.choices[0].message.content.strip() return cat if cat in VALID else "FYI"
def draft_reply(msg): resp = client.chat.completions.create( model="gpt-4o", temperature=0.7, messages=[{"role": "user", "content": DRAFT_PROMPT.format( sender=msg["from"][0]["email"], subject=msg["subject"], body=msg["body"], )}], ) body = resp.choices[0].message.content subprocess.run( ["nylas", "email", "draft", "--to", msg["from"][0]["email"], "--subject", "Re: " + msg["subject"], "--body", body, "--json"], check=True, )
def archive(msg): subprocess.run(["nylas", "email", "archive", msg["id"]], check=True)
for msg in fetch_unread(): cat = classify(msg) if cat in ("URGENT", "ACTION"): draft_reply(msg) elif cat == "NOISE": archive(msg) # FYI: do nothingDrafts land in your drafts folder — the agent never sends. You review, edit, and hit send (or not).
The drafting prompt
Section titled “The drafting prompt”Write a short, professional reply to this email. Three sentences max. Be direct.
From: {sender}Subject: {subject}Body: {body}
Reply:Higher temperature here (0.7) lets the model produce natural prose. The “three sentences max” is load-bearing — without it, you’ll get drafts that read like a politely overcompensating intern.
Cron it
Section titled “Cron it”*/15 * * * * /usr/bin/python3 /opt/triage/triage.py >> /var/log/triage.log 2>&1The whole thing is idempotent — only --unread messages are pulled, and once you read a draft (or the original) it falls out of subsequent runs.
Cost math
Section titled “Cost math”GPT-4o-mini classification: ~$0.15 per 1M input tokens. A 200-char snippet plus the prompt is 150 tokens. 100 emails ≈ 15K tokens ≈ $0.002. Drafting uses GPT-4o ($2.50 / 1M input) but only on the URGENT + ACTION subset — typically under 20% of the inbox. A heavy day at 200 unread emails costs roughly a nickel.
Privacy mode
Section titled “Privacy mode”For mail you don’t want hitting OpenAI / Anthropic, swap to a local Ollama:
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")# model="llama3.1" or similarLlama 3.1 classifies almost as well as GPT-4o-mini for this task. Drafting quality drops noticeably with smaller models — keep that on a hosted LLM unless you’re running a 70B+ parameter local model.
Things to know
Section titled “Things to know”- Only
--unread. The agent never re-classifies what it already drafted. If you re-read a draft, the original message stays read; the next cron skips it. - No auto-send. Always land in drafts. The cost of a wrong send (to the wrong person, at the wrong tone) is way higher than the friction of one extra click.
- Tune per inbox. Eng inboxes hit URGENT differently than sales inboxes. Customize the four categories and the prompt to your context.
Next steps
Section titled “Next steps”- Email support agent — adds knowledge-base lookup
- Reach inbox zero — interactive variant
- Build an LLM agent with email & calendar tools