Skip to content
Skip to main content

Clean message HTML and quoted text

Last updated:

Raw email bodies are messy: nested quoted replies, HTML signatures, tracking images, and “Sent from my iPhone” footers. If you’re showing a message preview or feeding text to a language model, that noise hurts. Stripping it with regex is a losing game against every provider’s HTML.

The Nylas clean messages API does the parsing for you. You point it at one or more messages and it returns just the meaningful body, with toggles to drop links, images, tables, and boilerplate phrases.

Send a PUT /v3/grants/{grant_id}/messages/clean request with a message_id array of the messages to clean. Nylas returns each message with a cleaned body that drops signatures, quoted history, and other clutter. Because message_id is an array, you can clean up to 20 messages in 1 request rather than calling per message.

The request below cleans one message with several options enabled.

The request body exposes 6 fields that control the output. ignore_links and ignore_images strip those elements, images_as_markdown keeps images as markdown instead, ignore_tables removes table tags while keeping the row text, remove_conclusion_phrases cuts sign-offs like “Best regards”, and html_as_markdown returns the cleaned body as markdown instead of the default plain text.

Set html_as_markdown to true when the cleaned text is headed for a language model, since markdown keeps structure like headings and lists that the default plain text drops. It’s a beta flag, and it requires images_as_markdown to be true as well.

A couple of things to keep in mind across the 6 fields. Cleaning is heuristic: it’s very good at common signature and quote patterns, but an unusual layout can leave a fragment or trim a line you wanted, so spot-check on real mail before you rely on it in a pipeline. The call returns cleaned text in the response and doesn’t modify the stored message, so the original stays intact on the provider.

This pairs naturally with Smart Compose and triage agents: clean the inbound message first, then feed the tidy text to whatever reads it next.