Voice agents have a slightly different shape from text agents — speech in, speech out, hard latency targets, no scrollback. But the email/calendar integration is the same idea: define tools, run them as subprocesses, return results to the LLM. This recipe shows the LiveKit / Vapi / generic patterns side by side, plus the voice-specific UX rules that make the experience feel like a real assistant rather than a robot reading mail at you.
The flow
Section titled “The flow”speech → STT → LLM (function-calling) → subprocess(nylas …) → JSON → LLM → TTS → speechThe agent transcribes the user, the LLM decides on a tool, the runtime spawns nylas <command> --json, the result comes back, the LLM composes a spoken response, TTS speaks it. The CLI absorbs every provider difference, so the agent is identical against Gmail, Microsoft 365, Exchange, Yahoo, iCloud, or IMAP.
LiveKit Agents
Section titled “LiveKit Agents”LiveKit’s @function_tool() decorator is the cleanest path:
from livekit.agents import function_toolimport subprocess, json
@function_tool()async def list_recent_emails(limit: int = 5) -> str: """List the last few emails. Keep limit small for voice.""" out = subprocess.run( ["nylas", "email", "list", "--limit", str(limit), "--json"], capture_output=True, text=True, timeout=30, ) return out.stdout if out.returncode == 0 else "Could not fetch emails."The decorator turns the function into a tool definition the agent sees. Nothing else is special; everything you know from a normal Python LiveKit agent applies.
Vapi (webhook-based)
Section titled “Vapi (webhook-based)”Vapi posts JSON to your backend when the LLM calls a tool. Your handler executes the CLI and returns the result in Vapi’s expected envelope:
app.post("/vapi/tools", async (req, res) => { const { name, parameters } = req.body.message.toolCall; const args = ["nylas", "email", "list", "--limit", String(parameters.limit ?? 5), "--json"]; const result = await execAsync(args, { timeout: 30000 }); res.json({ results: [{ toolCallId: req.body.message.toolCall.id, result: result.stdout, }], });});Generic (Retell, Bland.ai, OpenAI Realtime)
Section titled “Generic (Retell, Bland.ai, OpenAI Realtime)”The pattern is the same as the LLM agent recipe — define tool schemas, dispatch to subprocess wrappers, return results. The voice runtime is just the I/O layer around it.
Voice-specific UX rules
Section titled “Voice-specific UX rules”These aren’t optional — voice surfaces every UX mistake immediately:
-
Cap list responses at 5. Reading a 50-message inbox out loud takes minutes. Default
--limit 5and let the user say “more”. -
Summarize, don’t read. Don’t TTS the full subject + sender + snippet for each message. Have the LLM produce “You’ve got three emails from Ada about the contract, one from accounting, and a calendar invite from Rin” and let the user drill in.
-
Confirm before send. Always. Always. Speech-to-text mishears recipients and subjects in ways that send the wrong mail to the wrong person:
AGENT: "Send to Ada at acme.test, subject 'pricing', body 'I'm in'?"USER: "Yes."Only after the explicit yes does the agent invoke
send_email. -
Translate errors. “Error 401: invalid grant” is not a voice response. Map errors to short user-friendly lines: “I couldn’t fetch your email — you may need to re-authenticate.”
Set the timeout
Section titled “Set the timeout”Subprocess calls must have a timeout. Voice users won’t wait 60 seconds; the framework’s silence detection will kick in and the conversation falls apart. 30 seconds is the right number for both LiveKit and Vapi-style flows:
subprocess.run([...], timeout=30)If the CLI hits the timeout, return a graceful “I’m having trouble reaching email right now” instead of bubbling up the exception.
Why subprocess, not MCP
Section titled “Why subprocess, not MCP”MCP is great for chat agents that speak JSON-RPC natively. Voice runtimes generally don’t — they expect function-call-style tools where you hand back a JSON blob. Subprocess + --json is a cleaner fit for the voice request/response model than running an MCP server alongside the voice runtime.
Things to know
Section titled “Things to know”- Active grant. Voice agents serving multiple users need per-user grant routing. Either run a CLI process per user with their grant active, or pass
--api-keyand--grant-idexplicitly per command. - Audit logs are still useful. Even for voice, log every send to your own store — recipient, subject, agent run id, and approval source.
- Latency budget. Aim for subprocess round-trip under 2 seconds.
nylas email list --limit 5 --jsonis comfortably under; large mailbox lists may not be.
Next steps
Section titled “Next steps”- Build an LLM agent with email & calendar tools
- Use Nylas MCP with Claude Code — if your runtime does support MCP
- Build a Manus skill for Nylas