Skip to content
Skip to main content

Connect voice agents to email & calendar

Voice agents have a slightly different shape from text agents — speech in, speech out, hard latency targets, no scrollback. But the email/calendar integration is the same idea: define tools, run them as subprocesses, return results to the LLM. This recipe shows the LiveKit / Vapi / generic patterns side by side, plus the voice-specific UX rules that make the experience feel like a real assistant rather than a robot reading mail at you.

This recipe assumes the Nylas CLI is installed and authenticated. The CLI subcommands referenced below are documented in the nylas email list and nylas calendar events list command pages.

speech → STT → LLM (function-calling) → subprocess(nylas …) → JSON → LLM → TTS → speech

The agent transcribes the user, the LLM decides on a tool, the runtime spawns nylas <command> --json, the result comes back, the LLM composes a spoken response, TTS speaks it. The CLI absorbs every provider difference, so the agent is identical against Gmail, Microsoft 365, Exchange, Yahoo, iCloud, or IMAP.

LiveKit’s @function_tool() decorator is the cleanest path:

from livekit.agents import function_tool
import subprocess, json
@function_tool()
async def list_recent_emails(limit: int = 5) -> str:
"""List the last few emails. Keep limit small for voice."""
out = subprocess.run(
["nylas", "email", "list", "--limit", str(limit), "--json"],
capture_output=True, text=True, timeout=30,
)
return out.stdout if out.returncode == 0 else "Could not fetch emails."

The decorator turns the function into a tool definition the agent sees. Nothing else is special; everything you know from a normal Python LiveKit agent applies.

Vapi posts JSON to your backend when the LLM calls a tool. Your handler executes the CLI and returns the result in Vapi’s expected envelope:

app.post("/vapi/tools", async (req, res) => {
const { name, parameters } = req.body.message.toolCall;
const args = ["nylas", "email", "list",
"--limit", String(parameters.limit ?? 5),
"--json"];
const result = await execAsync(args, { timeout: 30000 });
res.json({
results: [{
toolCallId: req.body.message.toolCall.id,
result: result.stdout,
}],
});
});

Generic (Retell, Bland.ai, OpenAI Realtime)

Section titled “Generic (Retell, Bland.ai, OpenAI Realtime)”

The pattern is the same as the LLM agent recipe — define tool schemas, dispatch to subprocess wrappers, return results. The voice runtime is just the I/O layer around it.

These aren’t optional — voice surfaces every UX mistake immediately:

  1. Cap list responses at 5. Reading a 50-message inbox out loud takes minutes. Default --limit 5 and let the user say “more”.

  2. Summarize, don’t read. Don’t TTS the full subject + sender + snippet for each message. Have the LLM produce “You’ve got three emails from Ada about the contract, one from accounting, and a calendar invite from Rin” and let the user drill in.

  3. Confirm before send. Always. Always. Speech-to-text mishears recipients and subjects in ways that send the wrong mail to the wrong person:

    AGENT: "Send to Ada at acme.test, subject 'pricing', body 'I'm in'?"
    USER: "Yes."

    Only after the explicit yes does the agent invoke send_email.

  4. Translate errors. “Error 401: invalid grant” is not a voice response. Map errors to short user-friendly lines: “I couldn’t fetch your email — you may need to re-authenticate.”

Subprocess calls must have a timeout. Voice users won’t wait 60 seconds; the framework’s silence detection will kick in and the conversation falls apart. 30 seconds is the right number for both LiveKit and Vapi-style flows:

subprocess.run([...], timeout=30)

If the CLI hits the timeout, return a graceful “I’m having trouble reaching email right now” instead of bubbling up the exception.

MCP is great for chat agents that speak JSON-RPC natively. Voice runtimes generally don’t — they expect function-call-style tools where you hand back a JSON blob. Subprocess + --json is a cleaner fit for the voice request/response model than running an MCP server alongside the voice runtime.

  • Active grant. Voice agents serving multiple users need per-user grant routing. Either run a CLI process per user with their grant active, or pass --api-key and --grant-id explicitly per command.
  • Audit logs are still useful. Even for voice, log every send to your own store — recipient, subject, agent run id, and approval source.
  • Latency budget. Aim for subprocess round-trip under 2 seconds. nylas email list --limit 5 --json is comfortably under; large mailbox lists may not be.