Skip to content
Skip to main content

How to build a multi-turn email conversation

A single send-and-forget email is easy. A conversation that spans five exchanges over three days is harder. The agent needs to remember what it said, what the other person said, what it’s waiting for, and where in the workflow it is — across process restarts, deploy cycles, and hours of silence between replies.

This recipe builds that loop: the agent sends, waits for a reply, restores context, decides what to say, replies, and waits again. It runs entirely on webhooks and the Threads API so there’s no polling and no missed messages.

  1. Send an initial outbound message and persist the conversation state.
  2. Receive replies via message.created webhook.
  3. Restore the full conversation history from the thread.
  4. Feed the history to an LLM to generate the next reply.
  5. Send the reply in-thread and update the state.
  6. Handle the conversation lifecycle: escalation, completion, and dormancy.
  • An Agent Account with a message.created webhook subscribed. See Give your agent its own email.
  • A durable data store (Postgres, Redis, DynamoDB, or similar) for conversation state. In-memory won’t survive restarts, and email conversations span hours.
  • Access to an LLM for generating replies. The examples keep this abstract — any OpenAI-compatible API works.

Every active conversation needs a record that maps the Nylas thread_id to the agent’s internal state.

// What the agent stores per conversation
const conversationRecord = {
threadId: "nylas-thread-id",
grantId: AGENT_GRANT_ID,
contactEmail: "[email protected]",
contactName: "Alice Chen",
purpose: "demo_followup", // What started this conversation
step: "awaiting_reply", // Where in the workflow we are
turnCount: 1, // How many exchanges have happened
maxTurns: 10, // Safety cap before escalation
createdAt: "2026-04-14T10:00:00Z",
lastActivityAt: "2026-04-14T10:00:00Z",
metadata: {}, // Workflow-specific data
};

The step field is the heart of it. It tracks what the agent is waiting for and determines how it handles the next inbound message.

When the agent initiates contact, it sends the first message and creates the conversation record.

async function startConversation({ to, subject, body, purpose, metadata }) {
const sent = await nylas.messages.send({
identifier: AGENT_GRANT_ID,
requestBody: {
to: [{ email: to.email, name: to.name }],
subject,
body,
},
});
// Persist the conversation state keyed by thread_id.
await db.conversations.create({
threadId: sent.data.threadId,
grantId: AGENT_GRANT_ID,
contactEmail: to.email,
contactName: to.name,
purpose,
step: "awaiting_reply",
turnCount: 1,
maxTurns: 10,
createdAt: new Date().toISOString(),
lastActivityAt: new Date().toISOString(),
metadata: metadata ?? {},
});
return sent.data;
}

When a reply arrives, the webhook handler looks up the conversation, rebuilds history, and passes it to the LLM.

app.post("/webhooks/nylas", async (req, res) => {
res.status(200).end();
const event = req.body;
if (event.type !== "message.created") return;
const msg = event.data.object;
if (msg.grant_id !== AGENT_GRANT_ID) return;
// Skip messages the agent sent (outbound fires message.created too).
const agentEmail = "[email protected]";
if (msg.from?.[0]?.email === agentEmail) return;
const conversation = await db.conversations.findByThreadId(msg.thread_id);
if (!conversation) {
// New inbound message, not a reply to something we sent.
await triageNewInbound(msg);
return;
}
await continueConversation(msg, conversation);
});

Fetch the full thread from Nylas so the LLM has the complete conversation, not just the latest message.

async function continueConversation(msg, conversation) {
// Fetch full body (webhook only has summary fields).
const fullMessage = await nylas.messages.find({
identifier: AGENT_GRANT_ID,
messageId: msg.id,
});
// Pull the entire thread so the LLM sees the full exchange.
const thread = await nylas.threads.find({
identifier: AGENT_GRANT_ID,
threadId: conversation.threadId,
});
// Fetch every message in the thread for full conversation history.
const allMessages = await Promise.all(
thread.data.messageIds.map((id) =>
nylas.messages.find({ identifier: AGENT_GRANT_ID, messageId: id }),
),
);
// Format as a conversation transcript for the LLM.
const transcript = allMessages
.map((m) => m.data)
.sort((a, b) => a.date - b.date)
.map((m) => ({
role: m.from[0].email === "[email protected]" ? "agent" : "contact",
body: m.body,
date: new Date(m.date * 1000).toISOString(),
}));
// Check lifecycle constraints before generating a reply.
if (conversation.turnCount >= conversation.maxTurns) {
await escalate(conversation, "max turns reached");
return;
}
// Generate the reply.
const replyBody = await llm.generateReply({
purpose: conversation.purpose,
step: conversation.step,
transcript,
metadata: conversation.metadata,
});
// Send in-thread.
const sent = await nylas.messages.send({
identifier: AGENT_GRANT_ID,
requestBody: {
replyToMessageId: msg.id,
to: [{ email: conversation.contactEmail, name: conversation.contactName }],
subject: `Re: ${thread.data.subject}`,
body: replyBody.text,
},
});
// Update state.
await db.conversations.update(conversation.threadId, {
step: replyBody.nextStep ?? "awaiting_reply",
turnCount: conversation.turnCount + 1,
lastActivityAt: new Date().toISOString(),
metadata: { ...conversation.metadata, ...replyBody.metadata },
});
}

The LLM receives the full transcript and the current workflow step, so it can generate a contextually appropriate reply. It also returns a nextStep value that advances the conversation state machine.

Not every conversation ends neatly. Build handlers for the edges.

When the agent hits its turn limit, encounters a topic it can’t handle, or detects frustration, hand the conversation to a human.

async function escalate(conversation, reason) {
await db.conversations.update(conversation.threadId, {
step: "escalated",
metadata: { ...conversation.metadata, escalationReason: reason },
});
// Notify the human -- Slack, PagerDuty, internal API, whatever fits.
await notifyHumanOperator({
threadId: conversation.threadId,
contact: conversation.contactEmail,
reason,
});
}

When the agent determines the conversation’s purpose is fulfilled (the prospect booked a meeting, the support question was answered), mark it done so future messages on the same thread get handled correctly.

async function completeConversation(conversation) {
await db.conversations.update(conversation.threadId, {
step: "completed",
lastActivityAt: new Date().toISOString(),
});
}

Someone might reply to a conversation that’s been inactive for weeks. Decide up front what happens: re-read the thread and resume, escalate, or send a fresh introduction.

// In the webhook handler, before calling continueConversation:
const hoursSinceLastActivity =
(Date.now() - new Date(conversation.lastActivityAt).getTime()) / 3600000;
if (hoursSinceLastActivity > 168) {
// Over a week of silence -- escalate instead of auto-replying.
await escalate(conversation, "dormant thread reopened after 7+ days");
return;
}
  • Filter out the agent’s own messages. message.created fires for outbound too. If you don’t check msg.from, the agent will try to reply to itself.
  • Batch rapid replies. If someone sends two messages in quick succession, a short delay (30-60 seconds) before responding lets you treat them as one turn instead of generating two separate replies.
  • Cap the conversation length. An unbounded conversation loop is a token sink and a risk. The maxTurns field is there for a reason — set it based on what’s realistic for the workflow.
  • Persist conversation state durably. Redis with AOF, Postgres, DynamoDB — anything that survives restarts. The gap between messages can be days.
  • The LLM doesn’t need every message. For long threads, summarize earlier messages and pass only the last 3-4 in full. This keeps token usage reasonable without losing critical context.