Skip to content
Skip to main content

Summarize email threads with AI

Last updated:

A support escalation that’s been bouncing between four people for two weeks is 30 messages deep by the time it reaches you. Reading the whole thing to catch up takes ten minutes you don’t have. A two-sentence summary at the top, “customer can’t log in after the SSO migration, eng identified the cause, fix ships Thursday,” gets you there in ten seconds.

This recipe builds that summary. It fetches every message in a thread, orders them, and condenses the conversation with a language model, including the map-reduce pattern you need when a thread runs longer than a single model context window.

How do you summarize an email thread with AI?

Section titled “How do you summarize an email thread with AI?”

You summarize a thread in three steps: fetch the thread with the Threads API to get the IDs of every message, fetch each message body, then pass the ordered conversation to a language model for a short summary. The thread object unifies Gmail’s threading and the other providers’ conversation models into one shape across all 6 providers.

The same code summarizes a Gmail thread and an Outlook thread identically. The summary is a derived value, so cache it against the thread ID and refresh it only when a new reply lands. A typical 12-message thread summarizes in one model call for well under a cent.

Start with GET /v3/grants/{grant_id}/threads/{thread_id}, which returns the conversation’s participants, subject, and a message_ids array listing every message in the thread. The thread itself carries only the latest_draft_or_message body, not all of them, so the message_ids array is the list you iterate to pull each full message. A thread can hold more than 50 messages, so this two-step fetch is deliberate.

The function below reads the thread, then fetches each message body by ID. Fetching per message rather than paging the whole mailbox keeps you to exactly the messages in this conversation. For the thread object’s full field list, see Read a single message or thread.

import os, requests
from bs4 import BeautifulSoup
NYLAS = "https://api.us.nylas.com"
HEADERS = {"Authorization": f"Bearer {os.environ['NYLAS_API_KEY']}"}
def fetch_thread_messages(grant_id, thread_id):
t = requests.get(
f"{NYLAS}/v3/grants/{grant_id}/threads/{thread_id}", headers=HEADERS)
t.raise_for_status()
out = []
for mid in t.json()["data"]["message_ids"]:
r = requests.get(
f"{NYLAS}/v3/grants/{grant_id}/messages/{mid}", headers=HEADERS)
r.raise_for_status()
m = r.json()["data"]
body = BeautifulSoup(m["body"], "html.parser").get_text(" ", strip=True)
out.append({"date": m["date"], "from": m["from"][0]["email"], "body": body})
return out

A summary only reads correctly if the model sees the conversation in the order it happened, so sort the messages by their date field before building the prompt. Each message carries a date as a Unix timestamp, and none of the 6 providers guarantee message_ids are returned chronologically. Sorting yourself is one line and removes a whole class of “the model summarized the reply before the question” bugs.

The snippet below sorts the fetched messages and formats them into a single transcript with a sender label per turn. Labeling each turn with the sender lets the model attribute decisions to the right person, which matters when the summary needs to say who agreed to what. Trimming each body to its first 1,000 characters keeps a long thread inside budget without losing the gist.

def build_transcript(messages):
ordered = sorted(messages, key=lambda m: m["date"])
return "\n\n".join(
f"{m['from']}: {m['body'][:1000]}" for m in ordered
)

For a thread that fits the context window, the summary is a single completion. Give the model a system prompt that fixes the format, two or three sentences plus any decisions and open action items, and pass the transcript as the user turn. Use a temperature of 0.3 so the summary reads naturally without drifting from the facts. A thread of 12 messages runs about 4,000 tokens and costs roughly $0.001 with GPT-4o-mini.

The call below returns a short summary string. Asking explicitly for decisions and action items, not just a recap, is what makes the summary useful for catching up rather than a shorter version of the same wall of text.

from openai import OpenAI
client = OpenAI()
def summarize(transcript):
resp = client.chat.completions.create(
model="gpt-4o-mini",
temperature=0.3,
messages=[
{"role": "system", "content":
"Summarize this email thread in 2-3 sentences. "
"End with any decisions made and open action items."},
{"role": "user", "content": transcript},
],
)
return resp.choices[0].message.content

Handle threads longer than the context window

Section titled “Handle threads longer than the context window”

Most threads fit a modern context window, but a months-long escalation can run past it. When the transcript exceeds your token budget, switch to map-reduce: summarize each message or small batch on its own, then summarize the summaries. A thread of 200 messages that won’t fit in one context window reduces cleanly in 2 passes this way.

The structure is a loop, not a new API. Map each chunk to a one-line summary, concatenate those lines, and run a final reduce pass over them for the headline summary. This keeps any single model call well inside the limit and costs a little more in calls but stays bounded no matter how long the thread grows. Run the map calls concurrently to keep latency flat.

Summaries are expensive to recompute and cheap to store, so write each one to your database keyed on the thread ID. Reads then hit your cache in under 50 ms instead of regenerating, which matters on an inbox view that renders dozens of summaries at once. Stamp each stored summary with the timestamp of the latest message it covered.

Refresh on change rather than on a schedule. Subscribe to the message.created webhook, and when a new message lands in a thread, re-summarize only that thread. (thread.replied fires only for tracked threads you sent with tracking enabled, so message.created is the right trigger for incoming replies.) That keeps summaries current without re-running the model over a static archive every night. The webhook wiring is in Receive real-time webhooks.

Summaries inherit the model’s blind spots, so they’re a navigation aid, not a system of record. A summary compresses a thread of 30 messages into 3 sentences, so it drops detail by design. The model can drop a detail that turns out to matter, so always link the summary back to the full thread and never delete the source. For decisions with legal or financial weight, treat the summary as a pointer and have a human read the original.

Privacy is the same calculation as any inbox integration. A thread can contain contracts, personal data, and internal strategy, so decide whether full bodies may leave your infrastructure or call for a local model. The trust-boundary detail is in Connect an LLM to a user’s inbox.