Skip to content
Skip to main content

Connect voice agents to email & calendar

Voice agents have a slightly different shape from text agents — speech in, speech out, hard latency targets, no scrollback. But the email/calendar integration is the same idea: define tools, run them as subprocesses, return results to the LLM. This recipe shows the LiveKit / Vapi / generic patterns side by side, plus the voice-specific UX rules that make the experience feel like a real assistant rather than a robot reading mail at you.

speech → STT → LLM (function-calling) → subprocess(nylas …) → JSON → LLM → TTS → speech

The agent transcribes the user, the LLM decides on a tool, the runtime spawns nylas <command> --json, the result comes back, the LLM composes a spoken response, TTS speaks it. The CLI absorbs every provider difference, so the agent is identical against Gmail, Microsoft 365, Exchange, Yahoo, iCloud, or IMAP.

LiveKit’s @function_tool() decorator is the cleanest path:

from livekit.agents import function_tool
import subprocess, json
@function_tool()
async def list_recent_emails(limit: int = 5) -> str:
"""List the last few emails. Keep limit small for voice."""
out = subprocess.run(
["nylas", "email", "list", "--limit", str(limit), "--json"],
capture_output=True, text=True, timeout=30,
)
return out.stdout if out.returncode == 0 else "Could not fetch emails."

The decorator turns the function into a tool definition the agent sees. Nothing else is special; everything you know from a normal Python LiveKit agent applies.

Vapi posts JSON to your backend when the LLM calls a tool. Your handler executes the CLI and returns the result in Vapi’s expected envelope:

app.post("/vapi/tools", async (req, res) => {
const { name, parameters } = req.body.message.toolCall;
const args = ["nylas", "email", "list",
"--limit", String(parameters.limit ?? 5),
"--json"];
const result = await execAsync(args, { timeout: 30000 });
res.json({
results: [{
toolCallId: req.body.message.toolCall.id,
result: result.stdout,
}],
});
});

Generic (Retell, Bland.ai, OpenAI Realtime)

Section titled “Generic (Retell, Bland.ai, OpenAI Realtime)”

The pattern is the same as the LLM agent recipe — define tool schemas, dispatch to subprocess wrappers, return results. The voice runtime is just the I/O layer around it.

These aren’t optional — voice surfaces every UX mistake immediately:

  1. Cap list responses at 5. Reading a 50-message inbox out loud takes minutes. Default --limit 5 and let the user say “more”.

  2. Summarize, don’t read. Don’t TTS the full subject + sender + snippet for each message. Have the LLM produce “You’ve got three emails from Ada about the contract, one from accounting, and a calendar invite from Rin” and let the user drill in.

  3. Confirm before send. Always. Always. Speech-to-text mishears recipients and subjects in ways that send the wrong mail to the wrong person:

    AGENT: "Send to Ada at acme.test, subject 'pricing', body 'I'm in'?"
    USER: "Yes."

    Only after the explicit yes does the agent invoke send_email.

  4. Translate errors. “Error 401: invalid grant” is not a voice response. Map errors to short user-friendly lines: “I couldn’t fetch your email — you may need to re-authenticate.”

Subprocess calls must have a timeout. Voice users won’t wait 60 seconds; the framework’s silence detection will kick in and the conversation falls apart. 30 seconds is the right number for both LiveKit and Vapi-style flows:

subprocess.run([...], timeout=30)

If the CLI hits the timeout, return a graceful “I’m having trouble reaching email right now” instead of bubbling up the exception.

MCP is great for chat agents that speak JSON-RPC natively. Voice runtimes generally don’t — they expect function-call-style tools where you hand back a JSON blob. Subprocess + --json is a cleaner fit for the voice request/response model than running an MCP server alongside the voice runtime.

  • Active grant. Voice agents serving multiple users need per-user grant routing. Either run a CLI process per user with their grant active, or pass --api-key and --grant-id explicitly per command.
  • Audit logs are still useful. Even for voice, log every send to your own store — recipient, subject, agent run id, and approval source.
  • Latency budget. Aim for subprocess round-trip under 2 seconds. nylas email list --limit 5 --json is comfortably under; large mailbox lists may not be.